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(54) Titie: GENES AMPLIFIED IN CANCER CELLS 
(57) Abstract 

New methods arc disclosed for detecting cancer associated 
genes, and obtaining corresponding cDNA sequences. The 
methods involve supplying KtiA preparations from control cells, 
and from a plurality of different cancer cells that sham a 
duplicated or deleted gene in the same region of a chromosome. 
Amplified cDNA copies arc displayed, and then selected based 
on differences in abundance of RNA between preparations. 
Optional additional screening steps involve surveying panels of 
cancer cells using the cDNA for RNA overabundance with or 
widiout gene duplication. The identified genes can be used 
in turn to develop materials and techniques for diagnosing and 
treating the underlying cancer. Pour novel genes associated with 
cancer have been identified. In at least about 60 % of the breast 
cancer cell lines tested, RNA hybridizing with the cDNAs were 
substantially more abundant than in normal cells. Most of the 
cell lines also showed a duplication of the coiresponding gene, 
which probably contributed to the increased level of RNA in the 
celL However, for each of the four genes, there were some cell 
lines which had RNA overabundance without gene duplication. 
Hiis suggests that tbe gene product is sufficiently important 
to the cancer process that cells will use several alternative 
mechanisms to achieve increased expression. 
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5 This application claims the priority benefit of the fbflowing U.S. Patent applications: 

60/015.167. filed April 9. 1996; 60/019,202. filed.June 6. 1996; 08/678.280. filed July 10. 1996. For 
purposes of prosecution in the U.S.. the aforementioned applications are hereby incorporated herein 
by reference In their entirety. 

10 Technical Field 

The present invention relates generally to the field of human genetics. More specifically, it 
relates to the identification of novel genes associated with overabundance of RNA in human cancer 
such as breast cancer. It pertains especially to those genes and the products thereof which may be 
1 5 important in diagnosis and treatment 

Backgrpunp of the Invention 

Cancer is a heterogeneous disease. It manifests itself in a wide variety of tissue sites, with 
20 different degrees of de-differentiation, invasiveness, and aggressiveness. Some fomns of cancer 

are responsive to traditional modes of therapy, but many are not. For most common cancers, there 

is a pressing need to improve the arsenal of therapies available to provide nr)ore precise and more 

effective treatment in a less invasive way. 

As an example, breast cancer has an unsatisfactory morbidity and mortality, despite 
25 presently available forms of medical intervention. Traditional dinical initiatives are focused on early 

diagnosis, followed by suipery and chenDotherapy. Such interventions are of limited success, 

particularly In patients where the tumor has undergone metastasis. 

The heterogeneous nature of cancer arises because different cancer cells achieve their 

growth and pathological properties by different phenotypic alterations. Alteration of gene 
30 expression is intimately related to the uncontrolled growth and de<llfferentiation that are hallmarics 

of cancer. Certain similar phenotypic alterations in turn may have a different genetic base In 

different tumors. Yet. the number of genes central to the malignant process must be a finite one. 

Accordingly, new pharmaceuticals that are tailored to specific genetic alterations in an individual 

tumor may be more effective. 
35 There are two types of altered gene expression that take place, together or independently, 

in different cancer ceils (reviewed by Bishop). The first type is the decreased expression of 

recessive genes, known as tumor suppresser genes, that apparently act to prevent malignant 

growth. The second type is the increased expression of dominant genes, such as oncogenes, that 

-1 - 
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act to promote malignant growth, or to provide some other phenotype critical for malignancy. Thus, 
alteration in the expression of either type of gene is a potential diagnostic indicator Furthermore, a 
treatment strategy might seek to reinstate the expression of suppressor genes, or reduce the 
expression of dominant genes. The present invention is directed to identifying genes of either type. 
5 particularly those of the second type. 

The most frequently studied mechanism for gene overexpression in cancer cells is 
sometimes referred to as amplification. This is a process whereby the gene is duplicated within the 
chromosomes of the ancestral cell into multiple copies. The process involves unscheduled 
replications of the region of the chromosome comprising the gene, followed by recombination of the 
10 replicated segments back into the chromosome (Alttak) et al.). As a result. 50 or more copies of 
the gene may t>e produced. The duplicated region is sometimes referred to as an "ampllcon". The 
level of expression of the gene (that is, the amount of messenger RNA produced) escalates in the 
transformed cell in the same proportion as the number of copies of the gene that are made (Alitalo 
et al.). 

15 Several human oncogenes have been described, some of which are duplicated, for 

example, in a significant proportion of breast tumors. A prototype is the ert)B2 gene (also known 
as HER>2/n6u), which encodes a 185 kDa membrane growth factor receptor honrx>logous to the 
epidermal growth fector receptor. erb62 is duplicated in 61 of 283 tumors (22%) tested in a recent 
survey (Adnane et al.). Other oncogenes duplicated in breast cancer are the bek gene, duplicated 

20 in 34 out of 286 (12%); the fig gene, duplicated in 37 out of 297 (12%), the myc gene, duplicated in 
43 out of 275 (16%) (Adnane et a!.). 

Work with other oncogenes, particularly those desaibed for neuroblastoma, suggested that 
gene duplication of the proto-oncogene was an event invoh^ed in the more malignant forms of 
cancer, and could act as a predictor of clinical outcome (reviewed by Schwab et al. and Alitalo et 

25 al.). In breast cancer, duplk:ation of the erbB2 gene has been reported as correlating both with 
reoccurrence of the disease and decreased survival times (Slamon et al.). There is some evidence 
that erbB2 helps identify tumors that are responsive to adjuvant chemotherapy with 
cyclophosphamide, doxorubicin, and fluorouradl (Muss et ah). 

It Is clear that only a proportkm of the genes that can undergo gene duplication in cancer 

30 have been identified. First, chronfK)Some abnormalities, such as double minute (DM) chromosomes 
and homogeneously stained regions (HSRs). are abundant in cancer cells. HSRs are 
chromosonrial regions that appear in karyo^pe analysis with intermediate density Giemsa staining 
throughout their length, rather than with the normal pattern of alternating dark and light bands. 
They correspond to multiple gene repeats. HSRs are particularly abundant in breast cancers. 

35 showing up in 60-65% of tumors surveyed (Outrillaux et al.. Zafrani et al.). When such regk>ns are 
checked by in situ hybridization with probed for any of 16 known human oncogenes, including 
erbB2 and myc. only a proportion of tumors show any hybridization to HSR regions. FurthenTK>re, 
only a proportion of the HSRs within each karyotype are implicated. 

-2- 
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Second, comparative genomic hybridization (CGH) has revealed the presence of copy 
number increases in tumors, even in chromosomal regions outside of HSRs. CGH is a new 
method in which whole chromosome spreads are stained simultaneously with DNA fragments from 
normal ceils and from cancer cells, using two different fluorochromes. The images are 

5 computer-processed for the fluorescence ratio, revealing chromosomal regions that have 
undergone amplification or deletion in the cancer cells (Kallioniemi et al. 1992). This method was 
recently applied to 15 breast cancer cell lines {KalBoniemi et al. 1994). DNA sequence copy 
number increases were detected in all 23 chromosome pairs. 

Cloning the genes that undergo duplication in cancer is a formidable challenge. In one 

10 approach, human oncogenes have been identified by hybridizing with probes for other known 
growth-promoting genes, particularly known oncogenes in other species. For example, the erbB2 
gene was identified using a probe from a chemically induced rat neuroglioblastoma (Slamon et al.). 
Genes with novel sequences and functions will evade this type of search. In another approach, 
genes may be ctoned from an area Wentified as containing a duplicated region by CGH method. 

15 Since CGH is able to indicate only the approximate chromosonral regfon of duplicated genes, an 
extensive amount of experimentation is required to walk through the entire region and identify the 

partkajlar gene involved. 

Genes may also be overexpressed in cancer without being duplicated. Methods that rely 
on identification from genetic abnormalities necessarily bypass such genes. Increased expresskw 

20 can come about through a higher level of transcription of the gene; for example, by up-regulation of 
the promoter or substitution with an alternative promoter. It can also occur if the transcription 
product is able to persist tonger in the cell; (or example, by increasing the resistance to cytoplasmic 
RNase or by redudng the level of such cytoplasmic enzymes. Two examples are the epidermal 
growth factor receptor, overexpressed in 45% of breast cancer tumors (Mijn et al.). and the IGF-1 

25 receptor, overexpressed in 50-93% of breast cancer tumors (Berns et al.). In almost all cases, the 
overexpression of each of these receptors is by a mechanism other than gene duplication. 

One way of examining overexpression at the messenger RNA level is by subtractive 
hybridizatton. It involves producing positive and negative cDNA strands from two RNA 
preparations, and looking for cDNA which is not completely hybridized by ttie opposing preparatton. 

30 This is a laborious procedure which has distinct limitattons in cancer research. In particular, since 
each subtraction involves cDNA from only two cell populations at a time, it is sensitive to individual 
phenotypic differences due not just to the presence of cancer, but also through natural metaboiki 
varfatkms. 

Another way of examining overexpresskin at the messenger RNA level is by differential 
35 display (Liang et al. 1 992a). In this technique. cDNA is prepared from only a subpopulatton of each 
RNA preparatton, and expanded via the polymerase chain reactfon using primers of parttoular 
specificity. Similar subpopulations are compared across several RNA preparattons by gel 
autoradiography for expresston differences. In order to survey the RNA preparations entirely, the 
assay is repeated with a comprehensive set of PCR primers. The screening strategy more 
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effectively Includes multiple positive and negative control samples (Sunday et aL). The method has 
recently l>een applied to breast cancer cell lines, and highlights a number of expression differences 
(Liang et al. 1992b; Chen et al.. McKenzie et al,. Watson et al. 1994 & 1996. Kocher et al). By 
excising the con-esponding region of the separating gel. it is possible to recover and sequence the 
5 cDNA. 

Despite the advancement provided by differential display, problems remain in terms of 
applying it in the search for new cancer genes. First, because this is a test for RNA levels, any 
phenotypic difference between cell lines constitute part of the recovered set. leading to a large 
proportion of "false positive" identifications . It has been found that cDNA for mitochondrial genes 

10 constitute a large proportion of the differentially expressed bands, and it consumes substantial 
resources to recover the sample and obtain a partial sequence in order to eliminate them. Second, 
false positive identifications are made for reasons attributed to multiple cDNA species and 
competition for the PGR primers by RNA species of different abundance (Debouck). Third, 
differential display highlights high copy number mRNAs and shorter mRNAs (Bertioli et al.. 

15 Yeatman et al.) . and may therefore miss critical cancer-associated transcripts when used as a 
sun^ey technique. Fourth, a number of adjustments are made to gene expression levels when a 
cell undergoes malignant transformation or cultured in vitro. Most of these adjustments are 
secondary, and not part of the transformation process. Thus, even when a novel sequence is 
obtained from the differential display, it is far from certain that the corresponding gene is at the root 

20 of the disease process. ^ 

An eariy step in developing gene-specific therapeutic approaches is the identification of 
genes that are more central to malignant transformation or the persistence of the malignant 
phenotype. 

25 DISCLOSURE OP THg IWVEMTiON 

It is an objective of this invention to provide a method for identifying and characterizing 
genes and gene products which are duplicated or associated with overabundant RNA in cancer 
cells. The method can be used for any type of cancer, providing a plurality of cell populations or 

30 cell lines of the type of cancer are available, in conjunction with a suitable control cell population. 
The method is highly effective in identifying genes and gene products that are intimately related to 
malignant transformation or maintenance of the malignant properties of ttie cancer cells. 

An important derivative of applying the method Is the selection and retrieval of cDNA and 
cDISIA fragments corresponding to the cancer-associated gene. These fragments can be used 

35 infer a/Za to determine tiie nucleotide sequence of the gene and mRNA. the amino acid sequence of 
any encoded protein, or to retrieve from a cDNA or genomic library additional polynucleotides 
related to the gene or its transcripts. Since tiie genes are typically invoh^ed in the malignant 
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process of the cell, the polynucleotides, polypeptides, and antibodies derived by using this nrrethod 
can in turn be used to design or screen important diagnostic reagents and therapeutic compounds. 

Another oljjectlve of this invention to provide Isolated polynucleotides, polypeptides, and 
antibodies derived from lour novel genes which are associated with several different types of cancer. 

5 including breast cancer. The genes are designated CH1-9a11-2. CH8-2a13-1. CH13-2a12-1. and 
CH14-2a16-1. These designations refer to both strands of the cDNA and fragments thereof, and to 
the respective corresponding messenger RNA. including splice variants, allelic variants, and 
fragments of any of these farms. These genes show RNA overabundance In a ma^Jrity of cancer cett 
lines tested. A majority of the cells showing RNA overabundance also have duplication of the 

1 0 corresponding gene. Another object of this Invention is to provide materials and methods based on 
these polynucleotides, polypeptides, and antibodies lor use in the diagnosis and treatment of cancer. 

particulariy breast cancer. 

Accordingly, one embodiment of this invention is an isolated polynucleotide comprising a 
Unear sequence contained in a polynucleotide selected from the group consisting of CH1-9a11-2. 

15 CH8-2a13-1. CH13-2a12-1, and CH14-2a16-1. The Unear sequence is contained in a duplicated 
gene or overabundant RNA in cancerous ceBs. The RNA may be overabundant due to gene 
duplication, increased RNA transcription or processing, increased RNA persistence, any combination 
thereof, or by any other mechanism, in a proportion of breast cancer cells. Preferably, the RNA is 
overabundant in at least about 20% of a representative panel of breast cancer cell lines, such as the 

20 panels listed herein: more preferably, it is overabundant in at least about 40% of the panel; even more 
preferably, it is overabundant in at least 60% or more of the panel. Preferably, the RNA is 
overabundant in at least about 5% of spontaneously occurring breast cancer tumors; more preferably, 
it is overabundant in at teast about 10% of sych tumors; more preferably, it is overabundant in at least 
about 20% of such tumors; more preferably, it is overabundant in at least about 30% of such tumors; 

25 even more preferably, it is overabundant in at least about 50% of such tumors. 

Preferably, a sequence of at least 10 nudeotides is essentially identical between the isolated 
polynucleotide of the invention and a cDNAfrom CH1-9a11-2, CH8-2a13-1. CH13-2a12-1. and CH14- 
2a16-1; more preferably, a sequence of at least about 15 nucleotides is essentiaUy identical; more 
preferably, a sequence of at teast about 20 nucleotides is essentiaOy identical; more preferably, a 

30 sequence of at least about 30 nudeotides ii essentially identical; more preferably, a sequence of at 
teast about 40 nudeotides is essentially identical; even more preferably, a sequence of at teast about 
70 nudeotides is essentially identical; still more preferably, a sequence of about 100 nudeotides or 
more is essentially identical. A further embodiment of this invention is an isolated polynucteotide 
comprising a linear sequence essentially identical to a sequence selected from the group consisting of 

35 SEQ. ID NO:15. SEQ. ID NO:18. SEQ. ID N0:21. SEQ. ID NO:23. SEQ. ID NO:26. SEQ. ID NO:29. 
SEQ. ID N0:31.. SEQ. ID NO:33. and SEQ. ID NO:35. These embodiments indude an isolated 
polynucteotide which is a DNA polynucteotide. an RNA polynucteotide. a pdynudeotide probe, or a 
polynucteotide primer. 
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This invention also provides an isolated polypeptide connprising a sequence of amino acids 
essentially Identical to the polypeptide encoded by or translated from a polynucleotide selected from 
the group consisting of CH1-9a11-2. CH8-2a13-1. CH13-2a12-1. and CH14-2a16-1. Preferably, a 
sequence of at least about 5 amino acids is essentially identical between the polypeptide of this 
5 invention and that encoded by th€i polynudeotide; more preferat>ly. a sequence of at least about 10 
amino adds is essentially identical; more preferably, a sequence of at least 15 amino acids is 
essentially identical; even more preferably, a sequence of at least 20 amino acids is essentially 
identical; still more preferably, a sequence of about 30 amino acids or more is essentially identical. 
Preferably, the polypeptide comprises a linear sequence of at least 15 amino acids essentially 

10 identical to a sequence encoded by said polynucleotide. Another embodiment of this invention is a 
polypeptide comprising a linear sequence essentially identical to a sequence selected from the group 
consisting of SEQ. ID N0:17, SEQ. ID NO:20. SEQ. ID NO:25, SEQ. ID NO:28, SEQ. ID NO:30. 
SEQ. ID NO:32, SEQ. ID NO:34; and SEQ. ID NO:37. 

A further emt)0dlment of this invention is an antibody specific for a polypeptide embodied in 

1 5 this invention. This encompasses both monoclonal and isolated polyclonal antibodies. 

A further embodiment of this invention is a method of using the polynucleotides of this 
invention for detecting or measuring gene duplication in cancerous cells, espedally but not limited to 
breast cancer cells, comprising the steps of reacting DMA contained in a dinical sample with a 
reagent comprising the polynucleotide, said clinical sample having been obtained from an individual 

20 suspected of having cancerous cells; and comparing the amount of compfexes formed between the 
reagent and the DMA in the dinical sample with the amount of complexes formed between the 
reagent and DNA in a control sample. 

A further embodiment is a method of using the polynucleotides of this invention for detecting 
or measuring overabundance of RNA in cancerous cells, especially but not limited to breast cancer 

25 cells, comprising the steps of reacting RNA contained in a dinical sample with a reagent comprising 
the polynucleotide, said clniical sample having t>een obtained from an individual suspected of having 
cancerous cells; and comparing the amount of complexes formed between the reagent and the RNA 
in the dinical sample with the amount of complexes formed between the reagent arKi RNA in a control 
sample. 

30 Another embodiment of this invention is a diagnostic kit for detecting or measuring gene 

duplication or RNA overabundance in cells contained in an individual as manifest in a dinical sample, 
comprising a reagent and a buffer in suitable packaging, wherein the reagent comprises a 
polynucleotide of this invention. 

Another embodiment of this Invention is a method of using a pdypeptide of this invention for 

35 detecting or measuring specific antikxxiies in a dinical sample, comprising the steps of reacting 
antit)Odies contained in the clinical sample with a reagent comprising the pdypeptkie, said dink:ai 
sample having been obtained from an Individual suspected of having cancerous cells, espedafly but 
not limited to breast cancer cells; and comparing the amount of complexes formed betwveen the 
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reagent and the antibodies In the clinical sample with the amount of complexes fomrted t)etween the 
reagent and antibodies in a control sample. ^ 

Another embodiment of this invention is a nr)ethod of using an antibody of this Invention for 
detecting or measuring altered protein expression in a clinical sample, comprising the steps of 
5 reacting a polypeptide contained in the clinical sample with a reagent comprising the antibody, said 
clinical sample having been obtained from an individual suspected of having cancerous cells, 
especially but not limited to breast cancer cells; and comparing the amount of complexes fomied 
between the reagent and the polypeptide in the clinical sample with the amount of complexes fornied 
between the reagent and a polypeptide in a control sample. Further embodiments of this invention 

10 are diagnostic kits for detecting or measuring a polypeptide or antibody present in a clinical sample, 
comprising a reagent and a buffer in suitable packaging, wtierein the reagent respectively comprises 
either an antibody or a polypeptide of this invention. 

Yet another embodiment of this invention is a host cell transfected by a polynucleotide of this 
invention. A further embodiment of this Inventton is a method for using a polynucleotide for screening 

15 a pharmaceutk^al candklate, oonnprising the steps of separating progeny of the transfected host cell 
into a first group and a second group; treating the first group of cells with the pharmaceutical 
candidate; not treating the second group of cells with the pharmaceutical candkiate; and comparing 
the phenotype of the treated cells with that of the untreated cells. 

This invention also embodies a phannaceutk:al preparation for use in cancer therapy, 

20 comprising a polynucleotide or polypeptide embodied by this invention, sakJ preparation being 
capable of reducing the pathotogy of cancerous cells, especially for but not limited to breast cancer 
ceils. Further embodiments of this invention are methods for treating an individual bearing cancerous 
cells, such as breast cancer cells, comprising administering any of the aforementioned 
phamiaceutical preparations. 

25 Still another embodiment of this invention is a phanmac^tical preparation or active vaccine 

comprising a polypeptide embodied by tills invention in an immunogenic form and a pharmaoeutically 
compatible excipient A further embodiment is a method for treatment of cancer, especially but not 
limited to breast cancer, either pnophylactically or after cancerous cells are present in an individual 
being treated, comprising administration of the aforementioned pharmaceutical preparation. 

30 Another series of embodiments of this invention relate to methods for obtaining cONA 

corresponding to a gene associated with cancer, comprising the steps of. a) supplying an RNA 
preparation from uncultured control cells; b) supplying RNA preparations from at least two different 
cancer cells; c) displaying cONA corresponding to the RHA preparatk>ns of step a) and step b) 
such that different cDNA corresponding to different RNA In each preparation are displayed 

35 separately; d) selecting cDNA corresponding to RNA tiiat Is present in greater abundance in the 
cancer cells of step b) relative to the cdntit)! cells of step a); e) supplying a digested DNA 
preparation from control cells; f) supplying digested DNA preparations from at least two different 
cancer cells; g) hybridizing the cDNA of step d) with the digested DNA preparations of step e) and 
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Step 0; and h) further selecting cDNA from the cONA of step d) corresponding to genes that are 
duplicated in the cancer cells of step f) relative to the control cells of step e). 

One or wore enhancements may optionally be Included in the methods of this invention, 
including the following: 

5 1 . Cancer celts are preferably used for step b) that share a duplicated gene in the same 

region of a chromosome. If desired, the practitioner may test cancer cells beforehand 
to detect the duplication or deletion of chromosonrie regions; or cancer cell lines may 
be used that have already been characterized in this respect 

2. A higher pturality of cancer cells are preferably used to provide DNA for step b). step f). 
10 or preferably both step b) and step 0- The use of three cancer cells is preferred over 

two; the use of four cancer cells is more preferred, about five cancer cells is still more 
preferred, about eight cancer cells is even more preferred. The cDNA of each cancer 
cell population is displayed or hybridized separately, in accordance with the method. 

3. A higher plurality of control cells are preferably used to provide DNA for step a), step 
1 5 e), or preferably both step a) and step e). The use of two control cell populations is 

preferred; the use of three or more is even more preferred. Both proliferating and non- 
proliferating populations are preferably used, if available. 

4. The control cells are preferably supplied fresh from a tissue source, and are not 
cultured or transformed into a cell line. This is increasingly important when the control 

20 cell populations used in step a) is only one or two in number. Freshly obtained cancer 

cells may also be used as an alternative to cancer cell lines, although this is less 
critical. 

5. An additional screening step is preferably conducted in which the cDNA corresponding 
to the putative cancer-associated gene is additionally hybridized with a digested 

25 mitochondrial DNA preparation, to eliminate mitochondrial genes. This screening step 

may be conducted before, between, subsequent to, or simultaneously with the other 
screening steps of the method. 

6. An additional screening step is preferably conducted in which RNA is supplied from a 
plurality of cancer cells, and one or preferably more control cell populations; the RNA is 

30 contacted with cDNA corresponding to the putative cancer-associated gene under 

conditions that permit formation of a stable duplex, and cDNA is selected 
corresponding to RNA that is present in greater abundance in a proportion of the 
cancer cells relative to the control cells. Preferably, the plurality of cancer cells is a 
panel of at least five, preferably at least ten cells. Preferably at least three, more 

35 preferably at least five of the cancer cells show greater abundance of RNA. Preferably 

at least one and preferably more of the cancer cells shows a greater abundance of 
RNA compared with control cells, but does not show duplication of the corresponding 
gene in step h) of the method. 
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Other embodiments of the invention are methods for obtaining cDNA corresponding to a 
gene that is deleted or underexpressed in cancer, comprising the steps of: a) supplying an RNA 
preparation from control cells; b) supplying RNA preparations from at least two different cancer 
cells that share a deleted gene in the same region of a chromosome: c) displaying cDNA 
5 corresponding to the RNA preparations of step a) and step b) such that different cONA 
corresponding to different RNA in each preparation are displayed separately; and d) selecting 
cDNA corresponding to RNA that is present in lower abundance in the cancer ceils of step b) 
relative to the control cells of step a). Such methods typically comprise the following further steps: 
e) supplying a digested DNA preparation from control cells; f) supplying digested DNA 

10 preparations from at least two different cancer cells; g) hybridizing the cDNA of step d) with the 
digested DNA preparations of step e) and step f); and h) further selecting cDNA from the cDNA of 
step d) conresponding to a gene that is deleted in the cancer cells of step f) relative to the control 
cells of step e). Such methods for identifying deleted or underexpressed genes may also comprise 
enhancements such as those described above. 

15 Additional embodiments of this invention are methods for characterizing cancer genes, 

comprising obtaining cONA corresponding to a cancer-associated gene according to a method of 
this invention, particulariy those highlighted above, and then sequencing the cDNA. Alternatively or 
in addition, the cDNA may t>e used to rescue additional polynucleotides conresponding to a cancer- 
associated gene from an mRNA preparation, or a cDNA or genomic DNA library. 

20 Additional embodiments of this invention are methods for screening candidate dmgs for 

cancer treatment, comprising obtaining cDNA corresponding to a gene that is duplicated, 
overexpressed, deleted, or underexpressed in cancer, and comparing the effect of the candidate 
drug on a cell genetically altered with the cDNA or fragment thereof with the effect on a cell not 
genetically altered. 

25 Various embodiments of this invention may t}e employed in pursuit of any form of cancer 

for which suitable tissue sources are available. Cancers of particular interest include lung cancer, 
glbblastoma, pancreatic cancer, colon cancer, prostate cancer, hepatoma, myeloma, and breast 
cancer. 

30 BRIEF DESCRIPTION OF THE DRAWINGS 

F/guro t is a half-tone repioduction of an autoradiogram of a differential display experiment, in which 
radiolabeled cDNA oorresponding to a subset of total messenger RNA in different cells are compared. 
This Is used to select cDNA corresponding to particular RNA that are overabundant in breast cancer. 

35 

Figure 2 is a half-tone reproduction of an autoradiogram of electrophoresed DNA digests finom a 
panel of breast cancer cell lines prot>ed with a CH8-2a1 3-1 insert (Panel A) or a loading control (Panel 
B). 
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Figure 3 is a half-tone reproduction of an autoradiogram of electrophoresed total RNA from a panel of 
breast cancer cell lines prot>ed with a CH8-2a13-1 Insert (Panel A) or a loading control (Panel B). 

5 Figum 4 is a half-tone reproduction of an autoradiogram of electrophoresed DNA digests from a 
panel of breast cancer cell lines probed with a CH13-2a12-1 insert 

Figure 5 is a half-tone reproduction of an autoradiogram of electrophoresed total RNA from a parcel of 
breast cancer cell lines probed with a CH13-2a12-1 insert. 

10 

Figure 6 is a map of cONA fragments obtained tor the breast cancer associated genes CH1-9a11-2, 
CH8-2a13-1, CH13-2a12-1 and CH14-2a16-1. Regions of the fragments used to deduce sequence 
data listed in the application are indicated by shading. Nucleotide positions are numbered from the 
left-most residue for which double-strand sequence data has been obtained, which is not necessarily 
15 the 5' terminus of the corresponding message. 

FlgureJls a listing of primers used for obtaining the cDNAsequence data for CH1-9a11-2. 
Figure 6 is a listing of cDNA sequence obtained for CH1-9a1 1-2. 

20 

Rgure 9 is a listing of the amino acid sequence corresponding to the longest open reading frame of 
the DNA sequence of CH1-9a11-2 shown in Figure 6. The single-letter amino add code is used. 
Stop codons are indicated by a dot (•). The upper panel shows the complete amino add translation; 
the tower panel shows the predided gene produd protein sequence. A possible transmembrane 
25 region is indicated by underiining. 

Figure YD is a listing of primers used for obtaining the cONA sequence data for CH&-2a13-1. 
Figure If is a listing of cDNA sequence obtained for CH8-2a13-1. 

30 

Figure 12 is a listing of the amino acid sequence corresponding to the longest open reading frame of 
the DNA sequence of CH8-2a13-1 shown in Figure 11. The upper panel shows the complete amino 
add translation; the lower panel shows the predided gene product protein sequence. 

35 Figure 13 is a listing of the nucleotide sequence predided for a full-length CH8-2a13-1 cDNA 

Figure 14 is a listing of the amino acid sequence corresponding to the longest open reading frame of 
the DNA sequence of CH8-2a13-1 shown in Figure 13. 
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Figure IS is a listing of primers used for obtaining the cDNA sequence data for CH13-2a12-1, 
FIgurB 16 is a listing of cONA sequence obtained for CH13-2al2-1. 

5 

Figure f 7 is a listing of the amino add sequence conresponding to the longest open reading frame of 
the DNA sequence of CH13-2a12-1 shown in Figure 16. The upper panel shows the complete amino 
acid translation; the lower panel shows the predicted gene product protein sec^ence. 

10 Figure 18 is a listing of primers used for obtaining cDNA sequence data for CH13-2a12-1.. 

Figure f 9 is a listing of the cDNA sequence data obtained by two-directional sequencing for CH14- 
2a16-1. 

1 5 Figure 20 is a listing of the amino acid sequence corresponding to the longest open reading frame of 
the DNA sequence of CH14-2a16-1 shown in Figure 19. The upper panel shows the complete amino 
add translation; the lower panel shows the predicted gene product protein sequence. Residues 
corresponding to three zinc finger nrotifs are underlined, indicating that the protein may have DNA or 
RNA binding activity. 

20 

Figure 21 is a listing of additional DNA sequence data towards the 5' end of CH14-2a16-1 obtained 
by one-directional sequendng of the fragment pCH14-13. First two panels show nudeotide and 
amino add sequence from the 5' end of the fragment; the second two panels show nucleotide and 
amino add sequence from the 3' end of the fragment. Regions of overiap with pCH 14-800 are 
25 underlined. 

Figure 22 is a listing of the nucleotide sequences of initial fragments obtained corresponding to the 
four breast cancer associated genes, along with their amino add translations. 

30 Figure 23 is a listing of additional cDNA sequence obtained for CH1-9a11-2. comprising 
approximataly 1934 base pairs S' from the sequence of Figure 8. 

Figure 24 is a listing of the amino add sequence corresponding to the longest open reading frame of 
the DNA sequence of CH1-9a11-2 shown in Figure 23. The single-letter amino add code is used. 
35 Stop codons are Indicated by a dot (•). 

Figure 25 is a listing of additional cONA sequence obtained for CH14*2a16-1, comprising 
approximately 1934 base pairs 5' from the sequence of Figure 19. 
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Figum 26 is a listing of the amino acid sequence corresponding to the longest open reading frame of 
the DNA sequence of CH1-9a11-2 shown in Figure 25. The single-tetter amino add code is used. 
Stop codons are indicated by a dot (•). The upper panel shows the complete amino acid translation; 
5 the lower panel shows zthe predicted gene product protein sequence. 

BesT Mode for Carrying Out the Invewtiow 

This Invention relates to the discovery and characterizatbn of four novel genes associated 
10 with breast cancer The cDNA of these genes, and their sequences as disclosed below, provide the 
basis of a series of reagents that can be used in diagnosis and therapy. 

Using a panel of about 15 cancer cell lines, each of the four genes was found to be duplicated 
in 40-60% of the cells tested. Surprisingly, each of the four genes was duplicated in at least one cell 
line where studies using comparative genomic hybridization had not revealed any amplification of the 
15 corresponding chronx)somal region. 

Levels of expression at the mRNA level were tested in a similar panel for two of these four 
genes. In addition to those cell lines showing gene duplication, 17 to 37% of the lines showed RNA 
overabundance without gene duplication, indicating that the malignant cells had used some 
mechanism other than gene duplication to pronx>te the abundance of RNA corresponding to these 
20 genes. All four of the breast cancer genes have open reading frames, and likely are transcribed at 
various levels in different cell types. Overabundance of the corresponding RNA in a cancerous cell is 
likely associated with overexpressk>n of the protein gene product Such overexpressbn may be 
manifest as increased secretion of the protein from the cell into blood or the surrounding environment, 
an increased density of the protein at the cell surface, or an increased accumulatfon the protein within 
25 the cell, in comparison to the typical level in noncancerous cells of the same tissue type. 

Different tumors bear different genotypes and phenotypes, even when derived from the same 
tissue. Gene therapy in cancer Is more likely tojbe effective if it is aimed at genes that are involved in 
supporting the malignancy of the cancer. This inventk>n discloses genes that achieve RNA 
overabundance by several mechanisms, because they are more likely to be directly irivotved in the 
30 pathogenk: process, and therefore suitable targets for pharmacdogicdl manipulatkx). 

Features of the four novel genes, the respective mRNA, and the cDNA used to find them are 
provided in Table 1. 



wo 97/38085 PCT/US97/05930 





^:Chirbmo8b^w^ 






Ex9mplary CDNA : ; j 
Fragrhenta Ctbbed • j 






5 5kb 4 Skb 


1 1 kb 2 5 kb 1 




CH8-2a13-1 


4.2kb 


0.6 kb (two). 3.0 kb. 
4.0 kb 




CH13-2a12*1 


3.5kb. 3.2i(b 


1.6kb,3.5kb 




CH14-2a16-1 


3.8kb, 3kb 


O.Skb, 1.3kb.1.6kb.2.5 | 
kb 1 



Alt four genes sequences are unrelated to other genes known to be overexpressed in breast 
cancer, including the erbB2 gene (Adnane et al.). tissue factor (Chen et al.). mamnnaglobulin (Watson 
et al.), and DD96 (Kocher et al.). 
5 The four mRNA sequences each comprise an open reading frame. The CH1-9a1 1-2 gene is 

expressed at the mRNA level at relatively elevated levels in pancreas and testis. The CH8-2a13-1 
gene is expressed at relatively elevated levels in adult heart, spleen, thymus, small intestine, colon, 
and tissues of the reproductive system; and at higher levels in certain tissues of the fetus. The CH13- 
2a12-1 gene is expressed at relatively elevated leves in heart, skeletal muscle, and testis. The CH14- 

10 2a16-1 gene is expressed at relatively elevated levels in testis. The level of expression of all four 
genes is especially high in a substantial proportion of breast cancer cell lines. 

The CH1-9a11*2 gene encodes a protein with a putative transmembrane region, and may be 
expressed as a surface protein on cancer cells. The CH13-2a12-1 gene is distantly related to a C, 
elegans gene implicated in cell cycle regulatk)n, and may play a role in the regulatk>n of cell 

15 proliferation. The protein encoded by CH13-2a12-1 is distantly related to a vasopressin-activated 
calcium binding receptor, and may have Ca*^ binding activity. The CH14-2a16-1 comprises at least 
five domains of a zinc finger binding nrx)tif and is distantly related to a yeast RNA binding protein. The 

CH14-2a16-1 gene product is suspected of having DNA or Rr4A binding activity, which may relate to a 

I' 

role in cancer pathogenesis. 

20 The four genes described here are exemplars of genes that undergo altered expression in 

cancer, klentifiable using the gene screening methods of the inventbn. The method involves an 
analysis for both DNA duplicatk>n and altered RNA abundance relating to the same gene. Since 
abnormal gene regulation is central to the malignant process, the identification method may be 
brought to bear on any type of cancer. 

25 The screening method is superior to any previously available approach in several respects. 

Particulariy significant is that screening is rapidly focused towards genes that are central to the 
malignant process, and away from those that have variable levels of expression as part of normal 
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metabolic processes. Furthermore, because the end-product is a cDNA corresponding to the 
gene, the process leads rapidly to detailed characterization of the gene, and any effector molecule 
it may encode. This in tum leads to development of new diagnostic and therapeutic materials and 
techniques. 

5 

DofinUions 

Terms used in this application include the following: 

The term "polynucleotide" refers to a polymeric form of nucleotides of any length, either 

10 deoxy ribonucleotides or ribonucleotides, or analogs thereof. Polynucleotides may have any 
three-dimensional structure, and may perform any function, known or unknown. The folfowing are 
non-limiting examples of polynucleotides: a gene or gene fragment exons. Introns. messenger RNA 
(mRNA), transfer RNA, ribosomal RNA. ribozymes. cONA. recombinant polynucleotides, branched 
polynucleotides, plasmids, vectors, isolated ONA of any sequence, isolated RNA of any sequence. 

15 nuclek: acid probes, and primers. A polynucleotide may comprise nrKxlified nucleotkies. such as 
methylated nucleotides and nucleotide anafogs. If present. nfx>difications to the nudeotkle structure 
may be Imparted t>efore or after assembly of the polymer The sequence of nucleotides may be 
interrupted by non-nudeotide comporients. ^ A polynucleotide may be further modified after 
polymerization, such as by conjugatk>n with a labeling component 

20 The term polynucleotide, as used herein, refers interchangeably to double- and 

single-stranded molecules. Unless othenmse specified or required, any embodiment of the invention 
descrik)ed herein that is a polynucleotide encompasses tx)th the double-stranded form, and each of 
two complementary single-stranded forms known or predicted to make up the double-stranded torn. 
In the context of polynudeotkles. a "linear sequence" or a "sequence" is an order of 

25 nucleotides in a polynucleotide in a 5' to 3' direction in which reskJues that neighbor each other in the 
sequence are contiguous in the primary structure of the polynucleotide. A "partial sequence' is a 
linear sequence of part of a polynucleotide whtoh is known to comprise additional residues in one or 
both directions. 

"Hybridization" refers to a reaction in whk:h one or more polynucleotides react to form a 
30 complex that is stabilized via hydrogen bonding between the t>ases of the nucleotide residues. The 
hydrogen bonding is sequence-specific, and typically occurs by Watson-Crick base pairing. A 
hybridization reaction may constitute a step inia nrK>re extensive process, such as the initiation of a 
PCR. or the enzymatic cleavage of a polynucleotide by a ribozyme. 

Hybridization reactions can be peribrmed under conditions of different "stiingency". Relevant 
35 conditions Indude temperature, fonic strengtti, time of incubation, ttie presence of additional solutes in 
the reaction mixture such as formamide. and the washing procedure. Higher stringency conditions 
are those conditions, such as higher temperature and kiwer sodium ion concentration, which require 
higher minimum complenientarity between hybridizing elements for a stable hybridization complex to 
form. Conditions tiiat increase the stringency of a hybridization reaction are wklely known and 
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published in the art see. for example, 'Molecular Cloning: A Laboratory Manual', Second Edition 
(Sambrook. Fritsch & Maniatis, 1989). 

When hybridization occurs in an antiparallel configuration between two single-stranded 
polynucleotides, those polynucleotides are described as "complementary*. A double-stranded 
5 polynucleotide can t>e "complementary" to anott^er polynucleotide, if hybridization can occur t>etween 
one of the strands of the first polynucleotide and the second. Complementarity (the degree that one 
polynucleotide is complementary with another) is quantifiable in terms of the proportion of bases in 
opposing strands that are expected to form hydrogen bonding with each other, according to generally 
accepted base-pairing rules. 

10 A linear sequence of nucleotides is "identical" to another linear sequence, if the order of 

nucleotides in each sequence is the same, and occurs without substitution, deletion, or material 
substitution. It is understood that purine and pyrimidtne nitrogenous bases with similar structures can 
be functionally equivalent In terms of W^tson-Crick base-pairing; and the inter-substitution of like 
nitrogenous bases, particularly uracil and thymine, or the modification of nitrogenous bases, such as 

15 by methylation, does not constitute a material substitution. An RNA and a DNA potynucleotkle have 
identk:al sequences when the sequence for the RNA reflects the order of nitrogenous bases in the 
polyribonucleotktes. the sequence for the ONA reflects the order of nitrogenous bases in the 
polydeoxyribonucleotides, and the two sequences satisfy the other requirements of this definitk}n. 
Where one or both of the polynucleotides being compared is double-stranded, the sequences are 

20 identk:al if one strand of the first polynudeotkle is identk:al with one strand of the second 
polynucleotkJe. 

A linear sequence of nucleotides is "essentially identicaf to another linear sequence, if both 
sequences are capable of hytKidizing to form a duplex with the same complementary polynudeotkle. 
Sequences that hybridize under conditk)ns of greater stringency are more preferred. It is understood 

25 that hybridizatton reactions can accommodate insertkxis, detetkxis. and sut>stitutk)ns in the nucleotMe 
sequence. Thus, linear sequences of nucleotkJes can be essentially klentical even if some of the 
nucleotkje resklues do not precisely correspond or align. In general, essentially identical sequences 
of about 40 nucleotuies in length win hybridize at about 30GC in 10 x SSC (0.15 M NaCl, 15 mM 
citrate huffier); preferably, they will hybridize at about 400C in 6 x SSC; more preferat)ly, they will 

30 hybridize at about 500C in 6 x SSC; even nfX)re preferably, they will hybridize at about 600C in 6 x 
SSC, or at about 400C in 0.5 x SSC. or at about 300C In 6 x SSC containing 50% fbrmamide; still 
nK>re prefierably, they will hybridize at 400C or higher in 2 x SSC or k)wer in the presence of 50% or 
more fbnnamkje. It is understood that the rigor of the test is partly a function of the length of the 
polynudeotMe; hence shorter polynudeotMes with the same homotogy shoukj be tested under lower 

35 stringency and longer polynucleotkles shoukJ be tested under higher stringency, adjusting the 
conditions accordingly. The relationship between hyt>ridization stringency, degree of sequence 
kientity. and polynucleotide length is known in the art and can be calculated by standard formulae; 
see. e.g., Meinkoth et al. Sequertces that correspond or align more dosely to the inventk>n disdosed 
herein are comparably more preferred. Generally, essentially identical sequences are at least about 
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50% identical with each other, after attgnment of the homologous regions. Preferat)ly, the sequences 
are at least about 60% identical; more preferably, they are at least about 70% identical; more 
preferably, they are at least about 60% rdentical; more preferably, the sequences are at least about 
90% identical; even more preferably, they are at least 95% identical; still more preferably, the 
5 sequences are 100% identical. Percent identity is calculated as the percent of residues in the 
sequence being compared that are identical to those in the reference sequence, which is usually one 
of those listed or described in this application, unless stalled othenMse. No penalty is imposed for 
introduction of gaps in the reference or comparison sequence for purposes of alignment, but the 
resulting fragments must be rationally derived — small gaps may not be introduced to trivially improve 

10 the identity score. 

In detemfiining whether potynucteotide sequences are essentially identical, a sequence that 
preserves the functionality of the polynucleotide with which it is being compared is particulariy 
prefenred. Functionality may be established by different criteria, such as ability to hybridize with a 
target polynucleotide, and whether the polynucleotide encodes an identical or essentially identical 

15 polypeptides. Thus, nucleotide substitutions which cause a non-conservative substitution in the 
encoded polypeptide are preferred over nucleotide substitutions that create a stop codon; nucleotide 
substitutions that cause a conservative substitution in the encoded polypeptide are nrxxe preferred, 
and Identical nucleotide sequences are even worn preferred. Insertions or deletions in the 
polynucleotide that result in insertions or deletions in the polypeptide are preferred over those that 

20 result in the down-stream coding region being rendered out of phase. The relative importance of 
hybridization properties and the polypeptide encoded by a polynucleotide depends on the application 
of the invention. 

A "reagenf polynucleotide, polypeptide, or antibody. Is a substance provided for a reaction, 
the substance having some known and desirable parameters for the reaction. A reaction mixture may 

25 also contain a "target", such as a polynucleotide, antibody, or polypeptide that the reagent is capable 
of reacting with. For example, in some types of diagnostic tests, the amount of the target in a saniple 
is detemiined by adding a reagent, blowing the reagent and target to react, and measuring tiie 
amount of reaction product. In ttie context of clinical management, a "target" may also be a cell, 
collection of cells, tissue, or organ that is the object of an administered substance, such as a 

30 pharmaceutical compound. 

"cONA" or "complementary DNA" is a ^ngle- or double-sd^nded DNA polynucleotide in which 
one strand is complementary to a messenger RNA. "FulMength cDNA" is cDNA comprised of a strand 
which is complementary to an entire messenger RNA molecule. A "cDNA fragmenf as used herein 
generally represents a sub-region of tiie full-length form, but the entire fulNengtti cDNA nnay also be 

35 included. Unless explidtiy specified, ttie term cDNA enoonr^passes botii ttie fulMength fonm and the 
fragment form. 

Different polynudeotides are said to "correspond" to each other If one is ultinnateiy derived 
from another. For example, messenger RNA corresponds to the gene from which it is transcribed. 
cDNA corresponds to tiie RNA from which it has been produced, such as by a reverse transcription 
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reaction, or by chenrtical synthesis of a DNA based upon knowledge of the RNA sequence. cDNA 
also corresponds to the gene that encodes the RfM. Polynucleotides may be said to correspond 
even when one of the pair is derived from only a portion of the other. 

A "pnsbe" when used in the context of polynucleotide nianipulation refers to a polynucleotide 
5 which is provided as a reagent to detect a target potentially present in a sample of interest by 
hybridizing with the target. Usually, a probe will comprise a label or a means by which a label can be 
attached, either before or subsequent to the hybridization reaction. Suitable labels include, but are not 
limited to radioisotopes, fluorochromes, chemiluminescent compounds, dyes, and enzymes. 

A "primer^ is a short polynucleotide, generally with a free 3* -OH group, that binds to a target 

10 potentially present in a sample of interest by hybridizing with the target, and thereafter promoting 
pdymerization of a polynucleotide complementary to the target A "polymerase chain reaction" 
f PGR") is a reaction in which replicate copies are made of a target polynucleotide using one or more 
primers, and a catalyst of polymerization, such as a reverse transcriptase or a DNA polymerase, and 
particularly a thermally stable polymerase enzyme. Methods for PGR are taught in U.S. Patent Nos. 

15 4.683,195 (Mullis) and 4.683,202 (Mullis et al.). All processes of producing replicate copies of the 
saiDe polynucleotide, such as PGR or gene cloning, are collectively referred to herein as "replication." 

An "operon* is a genetic region comprising a gene encoding a protein and functionally related 
5' and 3' flanking regnns. Elements within an operon include but are not limited to promoter regkDns, 
enhancer regbns. repressor binding regions, transcription initiation sites, ribosome binding sites, 

20 translation initiation sites, protein encoding regions, introns and exons, and termination sites for 
transcription and translation. A "promoter^ is a DNA regbn capable under certain conditions of 
binding RNA polymerase and initiating transcription of a coding regk>n located downstream (in the 3' 
direction) from the promoter. 'Operably linked" refers to a juxtaposition of genetic eienr>ents. wherein 
the elements are in a relationship permitting them to operate in the expected manner. For instance, a 

25 promoter is operably linked to a coding region if tiie promoter helps initiate transcription of the coding 
sequence. There may be intervening residues between the promoter and coding region so k>ng as 
this functional relationship Is maintained. 

"Gene duplk:ation' is a term used herein to describe the process whereby an increased 
number of copies of a particular gene or a fragment thereof is present in a particular cell or cell line. 

30 "Gene amplification' generally is synonymous with gene duplication. 

"Expression*' is defined alternately in the scientifk: literature either as the transcription of a 
gene into an RNA pdynudeotide, or as the transcription and subsequent translation into a 
polypeptide. As used herein, "expre8sk}n" dr *gene expressnn" generally refers to the production of 
the RNA unless specified or required otherwise. Thus, "RNA overexpression" reflects the presence of 

35 more RNA (as a proportkm of total RNA) from a particular gene in a cell being described, such as a 
cancerous cell, in relation to that of the cell it is being compared with, such as a non-cancerous cell. 
The protein product of the gene may or may not be produced in normal or abnormal amounts. 
"Protein overexpressbn' similariy reflects the presence of relatively nrare protein present in or 
produced by, for example, a cancerous cell. 
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"Abundance** of RNA refers to the amount of a particular RNA present in a particular cell type. 
Thus, "RNA overabundance" or "overabundance of RNA" describes RNA that is present in greater 
proportion of total RNA in the cell type being described, compared with the same RNA as a proportion 
of the total RNA in a control cell. A number of mechanisms may contribute to RNA overabundance in 

. 5 a particular cell type: for example, gene duplication, increased level of transcription of the gene, 
increased persistence of the RNA within the cell after it is produced, or any combination of these. 
Similariy, "lower abundance" or "underabundance" describes RNA that is present in lower 
proportion in the cell being descrit>ed compared with a control cell. 

The terms ^^polypeptide", "peptide" and "protein" are used interchangeably herein to refer to 

10 polymers of amino acids of any length. The polymer may be linear or branched, it may comprise 
nrrodified amino acids, and it may be interrupted by non-amino acids. The terms also encompass an 
amino acid polymer that has been nrxxJified; for example, disulfide bond fomiation, glycosylation, 
lipidation, aoetytation, phosphorylation, or any other manipulation, such as conjugation with a labeling 
component 

15 In the context of polypeptides, a "linear sequence* or a "sequence" is an order of amino acids 

in a polypeptide in an N-tenminal to C-terminal direction in which residues that neighbor each other in 
the sequence are contiguous in the primary structure of the polypeptide. A "partial sequence" is a 
linear sequence of part of a polypeptide which Is krx>wn to comprise additional residues in one or both 
directions. 

20 A linear sequence of amino acids is "essentially identical" to anotiier sequence if the two 

sequences have a substantial degree of sequence identity. It is understood that the functional 
proteins can acconuTKXiate insertions, deletions, and substitutions in the amino acid sequence. Thus, 
linear sequences of amino acids can be essentially identical even If some of the residues do not 
precisely correspond or align. Sequertoes that correspond or align more closely to the invention 

25 disclosed herein are more preferred. It is also understood ttiat some amino add substitutions are 
more easily tolerated. For example, substitution of an amino acid with hydrophobic side chains, 
aromatic side chains, polar side chains, side chains witti a positive or negative charge, or side chains 
comprising two or fewer carbon atoms, by anoUier amino add with a side chain of like properties can 
occur witiiout disturi>ing ttie essential identi^ of tfie two sequences. Methods for detennining 

30 homologous regions and scoring the degree of homology are well known In the art; see for ^mple 
Altschul et al. and Henikoff et al. Welt-tolerated sequence differences are referred to as "consen/ative 
substitutions". Thus, sequences witii consenrative substitutions are preferred over those with other 
substitutions in the same positions; sequences witti identical residues at tiie same positions are still 
more preferred. In general, amino add sequences ttiat are essentially Identical are at least about 

35 15% identical, and comprise at least about anotiier 15% whk:h are either Identical or are conservative 
substitutions, after alignment of homok)gous regnns. More preferably, essentially identical 
sequences comprise at least about 50% identical residues or consen/ative substitutions; more 
preferably, they comprise at least about 70% klentical resklues or consen^tive substitutions; more 
preferably, they comprise at least about 80% klentical reskjues or conservative substitutions; more 
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preferably, they comprise at least about 90% identical residues or conservative substitutions; more 
preferably, they comprise at least about 95% identical residues or conservative substitutions; even 
more preferably, they contain 100% identical residues. 

In detemnining whether polypeptide sequences are essentially identical, a sequence that 
5 preserves the functionality of the polypeptide with which it is being compared is particularly preferred. 
Functionality may be established by different parameters, such as enzymatic activity, the binding rate 
or affinity In a reoeptor-ligand interaction, the binding affinity with an antibody, and X-ray 
crystallographic structure. 

An ''antibody' (interchangeably used in plural fbnn) is an tmmunogtobulin moleoile capable of 

10 specific binding to a target, such as a polypeptide, through at least one antigen recognition site, 
located in the variable region of the immunoglobulin molecule. As used herein, the term 
encompasses not only intact antibodies, but also fragments thereof, mutants thereof, fusion proteins, 
humanized antibodies, and any other modified configuration of the immunoglobulin molecule that 
comprises an antigen recognition site of the required specificity. 

15 The term "antigen" refers to the target nx)lecufe that is specifically bound by an antibody 

through its antigen recognition site. The antigen may. but need not be chemically related to the 
immunogen that stimulated production of the antibody. The antigen nriay be polyvalent or it may be a 
nfx>novaient hapten. Examples of icinds of antigens that can be recognized by antibodies include 
polypeptides, polynucleotides, other antibody molecules, oligosaccharides, complex lipids, drugs, and 

20 chemicals. An "immunogen* is an antigen capable of stimulating production of an antitx>dy when 
injected into a suitable host, usually a mammal. Compounds may be rendered immunogenic by many 
techniques known in the art, including crosslinfcing or conjugating with a carrier to increase valency, 
mixing with a mitogen to increase the immune response, and combining with an adjuvant to enhance 
presentation. 

25 An ''active vaccine" is a pharmaceutical preparation for human or animal use, which is used 

witti the intention of eliciting a specific inmune response. The immune response may be eittier 
humoral or cellular, systemic or secretory. The invnune response may te desired for experimental 
purposes, for the treatment of a particular condition, for tiie elimination of a particular substance, or for 
prophylaxis against a particular condition or substance. 

30 An "isolated" polynucleotide, polypeptide, protein, antitxxjy. or otiier sut)stance refers to a 

preparation of the substance devoid of at least some of tiie otiier components that may also be 
present where the substance or a similar substance naturally occurs or is initially obtained from. 
Thus, for example, an isolated substance may be prepared by using a purification technique to enrich 
it finom a source mixture. Enrichment can be measured on an absolute basis, such as weight per 

35 volume of solution, or it can t>e measured in relation to a second, potentially interfering substance 
present in the source mixture. Increasing enrichments of the embodiments of this Invention are 
increasingly more preferred. Thus, for example, a 2*fold enrichment is preferred, 10*f6ld enrichment 
is more preferred, 100-fold enrichment i^ more preferred. 1000-fold enrichment Is even more 
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preferred. A substance can also be provided in an isolated state by a process of artificial assembly, 
such as by chemical synthesis or recombinant expression. 

A polynucleotide used In a reaction, such as a probe used in a hybridization reaction, a primer 
used in a PGR. or a polynucleotide present in a phamnaceutical preparation, is referred to as "specific" 
5 or "selective" if it hybridizes or reacts with the intended target more frequently, more rapidly, or with 
greater duration than It does with alternative substances. SImilariy. an antibody is referred to as 
''specific* or "selec^ve" if it binds via at least one antigen recognition site to the intended target more 
frequently, more rapidly, or with greater duration tiian it does to altemative substances. A 
polynucleotide or antibody is said to "selectively inhibir or "selectively interfere witii" a reaction if it 

10 inhitrits or interferes with the react'on between particular substrates to a greater degree or for a 
greater duration tiian it does with tiie reaction between altemative substrates. An antibody is capable 
of "specifically delivering' a substance if it conveys or retains ttiat substance near a particular celt type 
more firequentiy or for a greater duration compared with ottier celt types. 

The "effector component" of a pharmaceutical preparation is a component which modifies 

15 target cells by altering tiieir function in a desirat>le way when administered to a subject bearing the 
cells. Some advanced phannaceutical preparations also have a "targeting component", such as an 
antibody, which helps deliver tiie effector component more efficaciously to the target site. Depending 
on tiie desired action, the effector component may have any one of a number of modes of action. For 
example, it tmy restore or enhance b normal function of a cell, it may eliminate or suppress an 

20 abnormal function of a cell, or it may alter a cell's phenotype. Altematively, it may kill or render 
dormant a cell with pathological features, such as a cancer cell. Examples of effector components are 
provided in a later section. 

A "phannaceutical candidate" or 'drug candidate" is a compound believed to have tiierapeutic 
potential, ttiat is to be tested for efficacy. The "screening" of a phannaceutical candidate refers to 

25 conducting an assay that is capable of evaluatirig ttie efficacy and/or specificity of the candidate. In 
this context, "efficacy" refers to the ability of tiie candidate to effect tiie cell or organism it is 
administered to In a beneficial way: for example, the limitation of tiie pattiology of cancerous cells. 

A "cell line' or "cell culture" denotes higher eukaryotic ceils grown or maintained in vitro. It is 
understood that the descendants of a cell may not be completely identical (either morphologically, 

30 genotyptcally, or phenotypically) to ttie parent cell. Cells described as "uncuttured" are obteined 
direcUy from a living organism, and have been mainteined fix a limited amount of time away from the 
organism: not long enough or under conditions for the cells to undergo substantial replication. 

"Genetic alteration" refers to a process wherein a genetic element is Introduced into a cell 
other tiian by mitosis or meiosis. The element may be heterologous to the cell, or it may be an 

35 additional copy or improved version of an element already present in the cell. Genetic alteration 
may be effected, for exampfe, by transfecting a cell with a recombinant plasmid or otiier 
polynucleotide through any process known in the art, such as eiectroporation, calcium phosphate 
precipitation, or contacting with a polynucleotide-liposome complex, or by tiBnsduction or infection 
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with a ONA or RNA virus or viral vector. The alteration is preferably but not necessarily inheritable 
by progeny of the altered cell. 

A "host ceir is a cell which has been genetically altered, or is capable of being genetically 
altered, by administration of an exogenous polynucleotide. 
5 The tems "cancerous ceir or "cancer cell", used either in the singular or plural form, refer to 

cells that have undergone a malignant transformation that makes them pathological to the host 
organism. Malignant transformation is a single- or multi-step process, which involves In part an 
alteration in the genetic makeup of the cell and/or the expressk>n profile. Malignant transformation 
may occur either spontaneously, or via an event or oombinatton of events such as drug or chemical 
10 treatment, radiation, fusion with other cells, viral infectkxi. or activation or inactivation of particular 
genes. Malignant transformatbn may occur in vivo or in vitro, and can if necessary be experimentally 
induced. 

A frequent feature of cancer cells is the tendency to grow in a manner that is uncontrollable 
by the host, but the pathology associated with a particular cancer cell may take another fonm. as 

15 outlined infra. Primary cancer cells (that is, cells obtained from near the site of malignant 
transformation) can be readily distinguished from non-cancerous cells by wel^stablished techniques, 
particularly histological examinatkMi. The definition of a cancer cell, as used herein, includes not only 
a primary cancer cell, but any cell derived from a cancer cell ancestor. This includes metastasized 
cancer cells, and in vitro cultures and cell lines derived from cancer cells. 

20 The "pathology" caused by a cancer cell within a host is anything that compromises the 

well-being or normal physk>k)gy of the host This may involve (but is not limited to) abnonmal or 
uncontrollable growth of the cell, metastasis, release of cytokines or other secretory products at an 
inappropriate level, manifestation of a function Inappropriate for its physlologk:al milieu, interference 
with the normal function of neighboring cells. aggravatk>n or suppression of an inflammatory or 

25 immunological response, or the hartx)ring of undesirable chemical agents or invasive organisms. 

"Treatmenf of an Individual or a cell Is any type of interventbn in an attempt to alter the 
natural course of the indivklual or cell. For example, treatment of an individual may be undertaken to 
decrease or limit the pathology caused by a cancer cell harbored in the indivkiual. Treatment includes 
(but is not limited to) administration of a compositkKi. such as a phannaceutical compositkKi, and may 

30 be performed either prophylactically, or subsequent to the initiation of a pathologrc event or contact 
with an etk>logic agent. Effective amounts used in treatment are those which are sufficient to 
produce the desired effect, and may be given in single or divkied doses. 

A "control cell' is an alternative source of cells or an attemative cell line used in an experiment 
for comparison purposes. Where the purpose of the experiment is to establish a base line for gene 

35 copy number or exprossion level, it is generally preferable to use a control cell that is not a cancer 
cell. 

The term "cancer gene" as used herein refers to any gene which is yiekJing transcription or 
translation products at a substantially altered level or in a substantially altered form in cancerous cells 



-21 



wo 97/38085 PCTAJS97/05930 

compared with non-cancerous cells, and which may play a rote in supporting the malignancy of the 
cell. It may be a normally quiescent gene that becomes activated (such as a dominant 
proto-oncogene). it may be a gene that becomes expressed at an abnomfialty high level (such as a 
growth factor receptor), it may be a gene that becomes mutated to produce a variant phenotype, or it 
5 may be a gene that becomes expressed at an abnonnally low level (such as a tumor suppresser 
gene). The present invention is directed towards the discovery of genes in ail these categories. 

It is understood that a "clinical sample' encompasses a variety of sample types obtained from 
a subject and useful in an in vitro procedure, such as a diagnostic test. The definition encompasses 
solid tissue samples obtained as a surgical renmval. a pathology specimen, or a biopsy spedmen. 
10 tissue cultures or cells derived therefrom and the progeny thereof, and sections or smears prepared 
' from any of these sources. Non-iimiting examples are samples obtained firom breast tissue, lymph 
nodes, and tumors. The defmition also encompasses blood, spinal fluid, and other liquid sample of 
biologic origin, and may refer to either the cells or eel) fragments suspended therein, or to the liquid 
medium and its solutes. 

15 The term "relative amount" is used where a comparison is made between a test 

measurement and a control measurement. Thus, the relative amount of a reagent forming a complex 
in a reaction is the amount reacting with a test specinr^n, compared with the amount reacting with a 
control specimen. The control specimen may be run separately in the same assay, or it may be part 
of the same sample (for example, nonmal tissue surrounding a malignant area in a tissue section). 

20 A "differentiar result is generally obtained from an assay in which a comparison is made 

between the findings of two different assay samples, such as a cancerous cell line and a control cell 
line. Thus, for example, 'differential expression' is observed when the level of expression of a 
particular gene is higher in one cell than another. "Differential display" refers to a display of a 
component, particularly RNA, from different cells to detemnine If there is a difference in the level of the 

25 component amongst different cells. Differential display of RNA is conducted, for example, by selective 
production and display of cDNA conBsponding"^ thereto. A method for performing differential display is 
provided in a later section. 

A polynucleotide derived from or corresponding to CH1-9a11-2, CHa>2a13-1. CH13-2a12-1, 
or CH14-2a16-1 is any of the following: the respective cDNA fragments, the corresponding 

30 messenger RNA, including splice variants and fragments thereof, both strands of the conresponding 
full-length cDNA and fragments thereof, and the corresponding gene. Isolated allelic variants of any 
of ttiese forms are included. This invention embodies any polynucleotide corresponding to CH1-9a1 1- 
2, CH8-2a13-1, CH13-2a12-1, or CH14-2a16-1 in an isolated form. It also embodies any such 
polynucleotide that has been cloned or transfected into a cell line. 

35 When used in referring to the gene screening methods of this invention (such as those 

outlined in the last paragraph), "displaying cDNA" is any technique in which DNA copies of RNA 
(not restricted to mRNA) is rendered detectable in a quantitative or relatively quantitative fashion, in 
that DNA copies present in a relatively greater amount in a first sample compared with a second 
sample generates a relatively stronger or weaker signal compared with that of the second sample 
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due to the difference in copy number. Separate display of different cDNA in a preparation 
(particularly but not limited to cONA of different size) allows comparison of levels of a particular 
cDNA between different samples. A preferred method of display is the differential display 
technique, and enhancements thereupon described in this disclosure and elsewhere. 
5 The term "digested" DNA encompasses DNA (particularly chromosomal DNA) that has 

been fragmented by any suitable chemical or enzymatic means into fragments conveniently 
separable by standard techniques, particularly gel electrophoresis. Digestion with a restriction 
endonuclease specific for a particular nucleotide sequence Is preferred. 

"Hybridizing" in this context refers to contacting a first polynucleotide with a second 

10 polynucleotide under conditions that permit the formation of a multi-stranded polynucleotide duplex 
whenever one strand of the first polynucleotide has a sequence of sufficient comptementarity to a 
sequence on the second polynucleotide. The duplex may be a long-lived one, such as when one 
DNA molecule is used as a labeled probe to detect another DNA molecule, that may optionally be 
bound to a nitrocellulose filter or present in a separating gel. The duplex may also be a shorter- 

15 lived one, such as when one DNA molecule is used to prime an amplirication reaction of the other 
DNA nK>lecule, and the amplified product is subsequently detected. The practitioner may alter the 
conditions of the reaction to alter the degree of comptementarity required, as long as sequence 
specificity remains a determining fector in the reaction. 

Unless explicitly indicated or othenwise required by the techniques used, the steps of a 

20 method of this invention may be performed in any order, or combined where desired and 
appropriate. In one example, in the meftiod comprising steps a) through h) that is described 
above, it Is entirely appropriate to conduct steps a) to c) of the method either before or after steps 
e) to g) of the method, as long as the cDNA ultimately selected fulfills the criteria of both steps d) 
and step h). In another example, screening against different digested DNA preparations, even if 

25 outlined separately, may optionally t>e done at the same time. All permutations of this kind are 
within the scope of the invention. 

Gmeral methods 

30 The practice of the present invention will enYploy, unless othen^vise indicated, conventional 

techniques of molecular biology, microbiology, recombinant DNA, and immunology, which are within 
the skill of the art. Such techniques are explained fully in the literature. See. for example, "Molecular 
Cloning: A Laboratory Manuar. Second Edition (Sambrook, Fritsch & Manlatis, 1989). 
"Oitgonucleotide Syntiiesis' (M.J. Gait, ed.^1984). "Animal Cell Culture" (RJ. Freshney, ed.. 1987); 

35 the series "Methods in Enzymok)gy" (Academk: Press, Inc.); "Handbook of Experimental Imnruinology' 
(D.M. Weir & C.C. Blackwell, Eds.), "Gene Transfer Vectors for Mammalian Cells" (J.M. Miller & M.P. 
Cabs, eds.. 1987). "Current Protocols in Molecular Biology' (P.M. Ausubel et al., eds.. 1987); and 
"CuHBnt Protocols in Immunology' (J.E. Coligan et al.. eds.. 1991). All patents, patent applications, 
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articles and publications mentioned herein, both supra and Infra, are hereby incorporated herein by 
reference. 

FmiurBS of th0 cancer gene screBning method 

5 

The cancer gene screening methods of this invention may be brought to bear to discover 
novel genes associated with cancer. Exemplars of cancer-associated genes identified by this 
method are described below. The exemplars were identified using breast cancer ceil lines and 
tissue, but the strategy can be applied to any cancer type of interest. 

10 A central feature of the cancer gene screening method of this invention is to ioolc for both 

DNA duplication and RNA overabundance relating to the same gene. This feature is particularly 
powerful in the discovery of new and potentially Important cancer genes. While amplicons occur 
frequently in cancer, the presently available techniques indicate only the broad chromosomal 
region involved in the duplication event, not the specific genes involved. The present invention 

15 provides a way of detecting genes that may be present in an amplicon from a functional basis. 
Because an early part of the method involves detecting RNA, the method avoids genes that nnay 
be duplicated in an amplicon but are quiescent (and therefore irrelevant) in the cancer cells. 
Furthermore, it recruits active genes from a duplicated region of the chromosome too small to be 
detectable by the techniques used to describe amplicons. 

20 Near the heart of this approach are several concepts. One is that genes encoding 

products implicated positively in the malignant process achieve elevated gene expression as a part 
of malignant transformation. In this context, "gene expression" refers to expression at the RNA 
transcription level. Most typically, the RNA is in turn be translated into a protein with a particular 
enzymatic, binding, or regulatory activity which increases after malignant transformation. In a less 

25 common example, the RNA may encode or participate as a ribozyme, antisense polynucleotide, or 
other functional nucleic acid molecule during malignancy. In a third example, RNA expression may 
be incidental but symptomatic of an important event in transformation. 

Another concept is that overexpression, if central to nr>alignant transformation, may be 
achieved in different tumors by different mechanisms, and that at least one such possible 

30 mechanism is gene duplication. Accordingly, a substantial proportion of transformed cells will have 
an amplicon, or duplicated region of a ehromosome, that includes within its compass the 
overexpressed gene. Other transformed cells may achieve RNA overabundance without gene 
duplication, such as by increasing the rate of transcription of the gene (e.g., by upregulation of the 
promoter region), by enhancing transcript promotion or transport, or by increasing mRNA survival. 

35 Thus, the method entails screening at the RNA level, several cancer cell lines or tumors, 

and several normal cell tines or tissue samples at the same time. RNA are selected that show a 
consistent elevation amongst the cancer cells as compared with normal cells. Additional strategies 
may be employed in combination with the RNA screening to Improve the success rate of the 
method. One such strategy is to use several cancer cell lines that are all known to have duplicated 
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genes in the same region of a particular chromosonie. Thus, the RNA that emerge from the screen 
are more likely to represent a deliberate overexpression event, and the overexpressed gene is 
likely to be within the duplicated region. A supplemental strategy is to use freshly prepared tissue 
samples rather than cell lines as controls for base-line expression. This avoids selection of genes 
5 that may alter their expression level just as a result of tissue culturing. Another supplemental 
strategy is to conduct an additional l^vel of screening, following identification of shared, 
overexpressed RNA. The selected RNA are used to screen DNA from suitable cancer cells and 
normal cells, to ensure that at least a proportion of the cells achieved the overexpression by way of 
gene duplication. 

10 The strategy for detecting such genes comprises a number of innovations over those that 

have been used in prevk)us work. 

The first part of the method is based on a search for particular RNAs that are overabundant 
in cancer cells. A first innovation of the method is to compare RNA abundance between control 
cells and several different cancer cells or cancer cell lines of the desired type. The cDNA 

1 5 fragments that emerge in a greater amount in several different cancer lines, but not in control cells, 
are more likely to reflect genes that are important in disease progression, rather than those that 
have undergone secondary or coinckiental activatk>n. It is particutarly preferred to use cancer cells 
that are known to share a corrvnon duplicated chromosomal regton. 

A second lnnovatk}n of this method is to supply as control, not RNA from a cell line or 

20 culture, but from fresh tissue santples of ^non*malignant origin. There are two reasons for this. 
First, the tissue will provide the spectrum of expresston that is typical to the normal cell phenotype, 
rather than individual differences that may become more prominent in culture. This establishes a 
more reliable baseline for normal expression levels. More importantly, the tissue will be devoid of 
the effects that in vitro culturing may have in altering or selecting particular phenotypes. For 

25 example, proto-oncogenes or growth factors may become up-regulated in culture. When cultured 
cells are used as the control for differential display, these ujs-regulated genes would be missed. 

A third innovation of this method is to undertake a suk)selection for cDNA corresponding to 
genes tliat achieve their RNA overabundance in a substantial proportion of cancer cells by gene 
duplication. To accomplish this, appropriate cONA corresponding to overabundant RNA klentified 

30 in the foregoing steps are used to probe digests of cellular DNA from a panel of different cancer 
cells, and from nonmal genomic DNA. cDNA that shows evidence of higher copy numbers in a 
proportion of the panel are selected for fiirther characterization. An addittonal advantage of this 
step is that cDNA corresponding to mitochpndrial genes can rapidly be screened away by including 
a mitochondrial DNA digest as an additional sample for testing the probe. This eliminates most of 

35 the false-positive cDNA, which othenfvise make up a majority of the cDNA identified. 

Thus, the kjentificatk)n of genes yielding products that are present at abnormal levels is 
accomplished by a method comprised of the following steps. 

To kJentify particular RNA that is overabundant In cancer cells, RNA is prepared from both 
cancerous and control cells by standard techniques. Cancer-associated genes may affect cellular 
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metabolism by any one of a number of mechanisms. For example, they may encode ribozymes. 
anti*sense polynucleotides. DNA-binding polynucleotides, altered rjbosomal RNA. and the tike. 
The gene screening methods of this invention may employ a comparison of RNA abundance levels 
at the total RNA level, not strictly limited to mRNA. However, the vast majority of cancer- 
5 associated genes are predicted to encode a protein gene whose up*regulation is closely linked to 
the metabolic process. For example, the four exemplary breast cancer genes descrit}ed elsewhere 
In this applk:ation all comprise an open reading firame. Accordingly, a focus on mRNA enriches the 
selectable pool for candidate cancer-associated genes. Focus towards mRNA can be conducted 
at any step in the method. It is particularly convenient to use a display method that displays cDNA 

10 copied only from mRNA. In this case, whole RNA may be prepared and analyzed from cancer and 
control cell populations without separating out mRNA. 

In terms of the cancer cells used as an RNA source. It is particularly advantageous to use 
a plurality of cancer ceils known to contain a duplicated gene or chromosomal segment In the same 
region of the chromosome. The duplicated segment need not be the same size in all the cells, nor 

15 is it necessary that the number of duplications be the same, so long as there is at least some part 
of the duplicated segment that is shared anrKjngst all the cancer cells used in the screen. Thus, a 
minimum of two, and preferably at least three cancer ceils are used that are sufficiently 
characterized to Identify a shared duplicated region, and can be used as a source of RNA for the 
screening test. In contrast, the control cell population will not comprise chromosomal duplk:atk>ns. 

20 Assuming the duplication to be related to the malignancy of the cancer cells, RNA 

transcribed from the duplicated region is expected to be overabundant compared with that of the 
control cell. Accordingly, a highly effective strategy is to klentify overabundant RNA that is present 
in all (or at least several) of the cancer ceil preparations, but none of the control preparatbns. By 
using cancer cells that share a duplicated chronriosomal region, the RNA comparison will be 

25 strongly biased in favor of RNA overabundance transcribed from the shared duplicated regk)n. 
Since the shared region is optimally only a smalt segment of a single chromosome, expression 
differences arising from elsewhere in the genome in one cancer cell or another will not be selected. 
We have found that this is highly effective in eliminating: a) RNA abundance differences resulting 
from normal metabolic variations between cells; and/or b) RNA abundance differences related to 

30 cancer cell malignancy, but occurring secondarily to malignant transformatkin. This is important 
because it considerably minimizes the chief deficiency in the use of RNA comparison methods, 
particularly differential display, for the screening of potential cancer genes: namely, the onerous 
number of false-positives that such techniques generate. 

Shared duplicated regions in cancer ceils may be identified by a relevant analytical 

35 technique, or by reference to such analysis already conducted and published. One approach that 
has been highly effective in mapping approximate sut>-chromosomal k)cations of duplicated 
segments is comparative genomic hybridization (CGH). This technique involves extracting, 
amplifying and labeling DNA from ttie subject cell; hybridizing to reference metaphase 
chronrK)somes treated to remove repetitive sequences; and observing tiie position of the hybridized 
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DNA on the chromosomes (WO 93/18186; Gray et al.). The greater the signal intensity at a given 
position, the greater the copy number of the sequences in the subject celt. Thus, regions showing 
elevated staining correspond to genes duplicated in the cancer cells, while regions showing 
diminished staining correspond to genes deleted in the cancer cells. Related techniques which a 
5 practitioner in the art will be well aware are methods for preparing and using repeat sequence 
chromosome-specific nucleic acid probes (US 5.427.932; Weier et al.). methods for staining target 
chromosomal DHA using labeled nucleic acid fragments in conjunction with blocking fragments 
complementary to repetitive DNA segments (US 5.447.841; Gray et al.). and methods for detecting 
amplified or deleted chromoson^l regions using a mapped library of labeled polynucleotide probes 

10 (US 5.472.842; Stokke et al.). if desired, multiple fluorochromes can be used as labeling agents 
with CGH and related techniques, to provide a three-color visualization of deleted, normal, and 
duplicated chromosome abnormalities (Lucas et al.). 

The choice of a particular chromosomal mapping approach is irrelevant, especially once 
knowledge of the duplicated region is known. If the location of the chronx)some duplication is 

15 already established for a cell line to be used in RNA comparison during the course of the present 
invention, then it is unnecessary to conduct a mapping technique de novo. For example, 
established cancer cell lines exist for which mapping data is already available in the public domain. 
Provided in the reference section of this application is a list of over 40 artksles in which the 
locations of duplicated regions in partk:ular cancer cells are described. In the context of the 

20 present invention, a plurality of cancer cells is chosen for the screening panel based on such data, 
so that they share a duplicated chromosomal region. The chromosomal location of a suspected 
duplication may be confirmed by hybridization analysis, if desired, using a probe specific for the 
location. 

The cancer cells used for RNA comparison are also generally (but not necessarily) derived 
25 from the same type of cancer or the same tissue. Using cells derived from the same type of cancer 
increases the probability that the gene ultimately Mentified wfll be common in that type of cancer, 
and suitable as a type-specific diagnostic marker. Using cells derived from different types of 
cancer is in effect a search for cancer-related genes that are less tissue specific and nnore related 
to the malignant process in general. Both types of genes are of interest for tx>th diagnostic and 
30 therapeutic purposes. In one illustratk>n highlighted in Example 1. RNA was screened from the 
three breast cancer cell lines BT474, SKBR3. and MCF7, which have been determined by CGH or 
Southern analysis to share a duplicated genetic regk>ns in chronfosomes 1, 8. 14. 17. and 20. 
When the RNA from these cells was displayed, a number of RNA were found to be overabundant 
in the cancer cells, but not controls (Figure 1). Three RNA overabundant In all three cancer cell 
35 lines corresponded to cancer-associated genes located on chromosomes 1. 8. and 14 that are 
listed in Table 1. The chronriosome 13 gene (CH13-2a12-1) was overexpressed in 2 of the 3 cell 
lines; namely BT474 and SKBR3. Southern analysts subsequently established that the 
chromosome 13 gene was duplicated in th^ same two cell lines (Example 6, Table 5). 
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Selection of the source or sources of control cell RNA is also a matter of some refinement 
The control RNA can be derived from In vitro cultures of non-malignant cells, or established cell 
lines derived from a non-malignant source. However, it is preferable for the control RNA to be 
obtained directly from normal human tissue of the same type as the cancer cells. This Is because 
5 most normal cells do not proliferate indefinitely; hence adaptation of a cell Into a cell line involves a 
degree of transformation. The transforming event may, in turn, be shared with that of certain 
cancer cells, at least at the level of RNA abundance. Hence, comparison of the RNA levels in 
cancer cells with so-called control cell lines may lead the practitioner to miss genes that are related 
to malignancy. For convenience, control cells may be n^intained in culture for a brief period 

10 before the experiment, and even stimulated; however, multiple rounds of cell division are to t>e 
avoided if possible. Use of both stimulated and unstimulated cells as controls may help provide 
RNA patterns corresponding to the normal range of abundance within various metat)olic events of 
the cell cyde. In one illustration highlighted in Example 1, RNA was screened using k>oth 
proliferating and non-proliferating cells. As stated, the screening of breast cancer RNA is 

15 preferably conducted using uncultured normal mammary epithelial cells (temned "organoids") as 
sources of control RNA. These ceils may be obtained from surgical samples resected from healthy 
breast tissue. 

The RNA is preserved until use in the comparison experiment in such a way to minimize 
fragmentation. To tacilitate confirmation experiments, it Is useful to use RNA of a reproducible 
20 character. For this reason, it is convenient to use RNA that has t>een obtained from stable 

r 

cancerous cell lines and/or ready tissue sources, although reproducibility can also be provided by 
preparing enough RNA so that it can be preserved in aliquots. 

For displaying relative overabundance of RNA in the cancer cells, compared with the 
control cells, many standard techniques are suitable. These would include any fbmn of subtractive 

25 hybridization or comparative analysis. Preferred are techniques in which nrK>re than two RNA 
sources are compared at the same tinie. such as various types of arbitrarily primed PGR 
fingerprinting techniques (Welsh et al.. Yoshlkawa et al.). Particulariy preferred are differential 
mRNA display methods and variations thereof, in which the samples are run in neighboring lanes in 
a separating gel. These techniques are focused towards mRNA by using printers that are specific 

30 for the poly-A tail characteristic of mRNA (Liang et al.. 1992a; U.S. Patent 5.262,311). 

Because many thousands of genes are expressed in the cells of higher organisms at any 
one time, it is preferable to Improve the legibility of the display by surveying only a subset of the 
RNA at a time. Methods for accomplishing this are known in the art A preferred method is by 
using selective primers that initiate PGR replicatton for a subset of the RNA. Thus, the RNA is first 

35 reverse transcribed by standard techniques. Short primers are used for the selectbn, preferably 
chosen such that alternative primers used in a series of like assays can complete a comprehensive 
survey of the mRNA. 

In a preferred example, primers can be used for the 3' region of the mRNAs which have an 
oligo-dT sequence, followed by two other nucleotides (TiNM. where i « 11, N € {A,C.G}, and M g 



28 



wo 97/38085 PCTAJS97/05930 

{A.C.G.T}). Thus. 12 possible primers are required to complete the survey. A random or arbitrary 
primer of minimal length can then be used for replication towards what con^sponds in the 
sequence to the 5' region of the mRNA. The optimal length for the random primer is about 10 
nucleotides. The product of the PGR reaction is labeled with a radioisotope, such as ^^S. The 
5 labeled cDNA is then separated by molecular weight, such as on a poiyacrylamide sequencing gel. 

If desired, variations on the differential display technique may be employed. For example, 
one-base oligo-dT primers may be used (Liang et al.. 1993 & 1994), although this Is generally less 
prefen-ed because the display pattern is conBspondingly more complex. Selection of primers may 
be optimized mathematically depending on the number of RHA species in a tissue of interest 

10 (Bauer et ai.). The method may be adapted for non-denaturing gels, and for use with automatic 
DNA sequencers (Bauer et al.). Altemative radioisotopes (Trentmann et at.) or fluorochromes (Sun 
et al.) may be used for labeling the differential display. Differential display may optionally be 
comtMned with a ribonudease protection assay (Yeatman et al.). PGR primers may optionally 
Incorporate a restriction site to facilitate cloning (Linskens et al.. Ayala et al.). Using Taq 

15 polymerase fix>m multiple manufacturers can increase the amount of variation under otherwise 
identical conditions (Haag et ai.). Nested PGR primers may be used in differential display to 
decrease background created by oligo-dT primers (WO 95/33760). Other variants of the 
differential display technique are known in the art and described inter alia in the references cited In 
this disctosure. The use of such modifications are within the scope of the present invention, but are 

20 not required, as evidenced by the examples described below. 

Based on the comparison of relative abundance of RNA, particular RNAs are chosen which 
are present as a higher proportion of the RNA in cancerous cells, compared with control ceils. 
When using the differential display method, the cDNA corresponding to overabundant RNA will 
produce a band with greater proportional intensity amongst neighboring cDNA bands, compared 

25 with the proportional intensity in the control lanes. Desired cDNAs can t>e recovered most directly 
by cutting the spot in the gel corresponding to the band, and recovering the DNAs therefrom. 
Recovered cDNA can be replicated again for further use by any technique or combination of 
techniques known in the art. including PGR and ckDning into a suitable carrier. 

An optk)nal but highly bene^al additional sheening step, typtoally performed 

30 subsequently to an RNA comparison as described above, is aimed at identifying genes that are 
duplicated in a substantial proportion of cancers. This is conducted by using cDNA such as 
selected from differential display to probe digests of chromosomal DNA obtained from two or more 
cancerous cells, such as cancer cell lines. Ghromosomal DNA from non-cancerous cells that 
essentially reflects the genn line in terms of gene copy number is used for the control. A preferred 

35 source of control DNA in experiments for^human cancer genes is placental DNA. whk:h is readily 
obtainable. The DNA samples are cleaved at sequence-spedfic sites along the chromosome, most 
usually with a suitable restrlctton enzyme into fragments of appropriate size. The DNA can be 
bk>tted directly onto a suitable medium, or separated on an agarose gel before blotting. The latter 
method is preferred, because it enables a comparison of the hybridizing chromosomal restriction 
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fragment to determine whether the probe is binding to the same fragment in all samples. The 
amount of probe binding to DNA digests from each of the cancer cells is compared with the amount 
binding to control DNA. 

Because the comparison is quantitative, it is preferable to standardize the measurement 
5 internally. One method is to administer a second probe to the same blot, probing for a second 
chromosomal gene unlikely to be duplicated in the cancer cells. This method is preferred, because 
it standardizes not only for differences in the amount of DNA provided, but also for differences in 
the amount transferred during blotting. This can be accomplished by using altemative labels for 
the two probes, or by stripping the first probe with a suitable eluant before administering the 
10 second. 

To eliminate cDNA for nr>itochondrial genes, it is preferable to include in a parallel analysis 
a mitochondrial DNA preparation digested with the same restriction enzyme. Any cDNA probe that 
hybridizes to the appropriate mitochondrial restriction fragments can t>e suspected of 
corresponding to a mitochondrial gene. 

15 in the initial replication of the RNA, the random primer may bind at any location along the 

RNA sequence. Thus, the copied and replicated segment may t>e a fragment of the full-length 
RNA. Longer cDNA corresponding to a greater portion of the sequence can be obtained, if 
desired, by several techniques knos^ to practitioners of ordinary skill. These include using the 
cDNA fragment to isolate the corresponding RNA, or to isolate complementary DNA from a cDNA 

20 library of the same species. Preferably, the library is derived from the same tissue source, and 
more preferably from a cancer ceil line of the same type. For example, for cDNA corresponding to 
human breast cancer genes, a preferred library is derived from breast cancer C9II line BT474. 
constructed in lambda GT10. 

Sequences of the cDNA can be detennined by standard techniques, or by submitting the 

25 sample to commercial sequencing services^ The chromosomal k>cations of the genes can be 
determined by any one of several methods known in the art. such as in situ hybridization using 
chromosomal smears, or panels of somatic cell hybrids of known chromosomal composition. 

The cDNA obtained through the selection process outlined can then be tested against a 
larger panel of cancer cell lines and/or fresh tumor cells to determine what proportion of the cells 

30 have duplicated the gene. This can be accomplished by using the cDNA as a probe for 
chromosomal DNA digests, as described eariier. As illustrated in the Example section, a preferred 
method for conducting this determination is Southem analysts. 

The cDNA can also be used to detemiine what proportion of the cells have RNA 
overabundance. This can be accomplished by standard techniques, such as stot blots or blots of 

35 agarose gels, using whole RNA or messenger RNA from each of the cells in the panel. The blots 
are then probed with the cDNA using standard technk|ues. It is preferable to provide an internal 
loading and blotting control for this analysis. A prefen'ed method is to re-probe the same blot for 
transcripts of a gene likely to be present in about the same level in all cells of the same type, such 
as the gene for a cytoskeletal protein. Thus, a preferred second probe is the cDN A for t)eta-actin. 
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Using a novel cDNA found by this selection procedure, it is anticipated that essentially all 
cancer cells showing gene duplication will also show RNA overabundance, but that some will show 
RNA overabundance without gene duplication. 

The practitioner will readily appreciate that the strategies for Identifying genes that are 
5 duplicated and/or associated with RNA overabundance may be reversed appropriately to screen 
for genes that are deleted and/or associated with RNA underabundance. The principles are 
essentially the same. Genes that are frequently down-regulated in cancer (such as tumor 
suppresser genes) may be down-regulated by different mechanisms in different cells, and a gene 
with this behavior is more likely to be central to malignant transformation or persistence of the 
10 malignant state. 

To screen for such down-regulated genes according to the present invention, RNA is 
prepared from a plurality of tunK>rs or cancer cell lines and the abundance is compared with RNA 
preparation from control cells. Again, it is highly preferable to use cancer cells that share a deleted 
gene in the same chromosomal region, in order to focus any differences at the RNA level towards 

15 particular alterations in cancer cells and away from normal variations or coincidental changes. The 
CGH technique may be used to identify deletions in previously uncharacterized cancer ceils. As 
before, cancer cells may be chosen on the basis of previous knowledge of deleted regions; there is 
no need to conduct methods such as CGH on previously characterized lines. cDNA from the RNA 
of cancer cells is displayed (prefecably by differential display) alongside cDNA copied from 

20 (preferably uncultured) control cells, and cDNA is selected that appears to be underrepresented in 
at least two (preferably more) of the cancer cells compared with the control cells. cONA thus 
selected may optionally be further screened against digested DNA preparations, to confirm that the 
RNA underabundance ot)served in the cancer cell populations is attributable in at least a proportion 
of the cells to an actual gene deletion. 

25 As before, the cONA may be used for sequencing or rescuing additional polynucleotides, in 

this case not from the cancer cells but from cells containing or expressing the gene at nonmal 
levels. Pharmaceuticals based on deleted genes or those associated with underexpressed RNA 
are typically oriented at restoring or upregulating the gene, or a functional equivalent of the 
encoded gene product 

30 

The identification of four Bxwnpiary cancer associated genes 

To identify particular RNA that is overabundant in cancer cells. RNA has been compared 
between breast cancer ceils and control cells. The amount of total cellular RNA was compared using 
35 a modified differential display method. Primers were used for the 3' region of the mRNAs whk:h have 
an oligo-dT sequence, fblbwed by two other nucleotides as described in the previous section. 
Random or arbitrary primers of about 10 nucleotides were used for replication towards what 
corresponds in the sequence to the 5' region of the mRNA. The labeled amplifrcatton product was 
then separated by molecular weight on a polyacrylamkie sequencing gel. 
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Particular mRNAs were chosen that were present in a higher proportion of the RNA in 
cancerous cells, compared with control cells, according to the proportional intensity amongst 
neightxjring cDNA bands. The cDNA was recovered directly from the gel and amplified to provide a 
probe for screening. Candidate polynucleotides were screened by a number of criteria, including both 
5 Northern and Southern analysis to determine if the corresponding genes were duplicated or 
responsible for to RNA overabundance in breast cancer cells. Sequence data of the polynucleotides 
was obtained and compared with sequences in GenBank. Novel polynucleotides with the desired 
expression patterns were used to probe for longer cDNA inserts in a XgtIO library constructed from 
the breast cancer cell line BT474. which were then sequenced. 
1 0 Further description of the actual experimental events that occurred during identification of the 

four exemplary genes, and sequence data for CH1-9a11-2. CH8-2a13-1, CH13-2a12-1, and CH14. 
2a 16-1 are provided in the Example section. 
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Preparation ofpoiynudeotldas, polypeptides and an^bodias 



Polynucleotides based on the cDNA of CH1-9a11-2. CH8.2a13-1, CH13-2a12-1. CH14- 
2a16-1. can be rescued from cloned plasmids and phage provided as part of this invention. They 
may also tie obtained firom breast cancer cell libraries or mRNA preparations, or from normal human 
tissues such as placenta, by judicious use of primers or probes based on the sequence data provided 

20 herein. Alternatively, the sequence data provided herein can be used in chemical synthesis to 
produce a polynucleotide with an identical sequence, or that incorporates occasional variations. 

Polypeptides encoded by the corresponding mRNA can be prepared by several different 
methods, all of which will be known to a practitioner of ordinary skill. For example, the appropriate 
strand of the fulMength cDNA can be operably Imked to a suitable promoter, and transfected into a 

25 suitable host cell. The host cell is then cultured under conditions that aUow transcription and 
translation to occur, and the polypeptide is subsequenUy recovered. Another convenient method is to 
determine the polynucleotide sequence of ttie cDNA, and predict the polypeptide sequence according 
to the genetic code. A polypeptide can then be prepared directiy. for example, by chemical synthesis, 
ettiier identical to the predcted sequence, or incorporating occasional variations. 

30 Antibodies against polypeptides of this invention may be prepared by any method known in 

the art. For stimulating antibody production in an animal, it is often prefierabte to enhance the 
immunogenicity of a polypeptide by such technkfues as polymerizatk)n with glutaraldehyde. or 
combining with an adjuvant such as Freund's adjuvant The immunogen is injected into a suitable 
experimental animal: preferably a rodent for the preparation of monoctonal antibodies; preferably a 

35 larger animal such as a rabbit or sheep for preparation of polycbnal antibodies. It Is preferable to 
provkto a second or booster injection after about 4 weeks, and begin han/esting the antibody source 
no less than about 1 week later. 

Sera harvested from the immunized animals provide a source of polyctonal antibodies. 
Detailed procedures for purifying specific antibody activity from a source material are known wittiin the 

-32- 



wo 97/38085 ^ PCT/US97/05930 

art Unwanted activity cross-reacting with other antigens, if present, can be rennovecl, for example, by 
running the preparation over adsorbants nnade of those antigens attached to a solid phase, and 
collecting the unbound fraction. If desired, the specific antibody activity can be further purified by such 
techniques as protein A chromatography, ammonium sulfate precipitation, ton exchange 
5 chromatography, high-perfbnmance liquid chronr\atography and immunoaffinity chromatography on a 
column of the immunizing polypeptide coupled to a solid support 

Alternatively, immune cells such as splenocytes can be recovered from the immunized 
animals and used to prepare a monoclonal antibody-producing cell Ime. See, for example. Harrow & 
Lane (1986). U.S. Patent Nos. 4.491.632 (J.R. Wands et al.). U.S. 4,472.500 (C. Mtlstein et at), and 

10 U.S. 4.444.887 (M.K. Hoffman et at) 

Briefly, an antibody-producing line can be produced Inter alia by cell fusion, or by transfecting 
antibody-producing cells with Epstein Barr Virus, or transfbmiing with oncogenic DNA. The treated 
cells are cloned and cultured, and clones are selected that produce antibody of the desired specificity. 
Specificity testing can be performed on culture supematants by a number of techniques, such as 

15 using the immunizing polypeptide as the detecting reagent in a standard immunoassay, or using cells 
expressing the polypeptide In tmmunohlstochemistry. A supply of monoclonal antibody from the 
selected clones can be purified from a large volume of tissue culture supernatant, or from the ascites 
fluid of suitably prepared host animals injected with the clone. 

Effective variations of this method include those in which the immunization with the 

20 polypeptide is performed on isolated ceils. Antibody fragments and other derivatives can be prepared 
by methods of standard protein chemistry, such as subjecting the antibody to cleavage with a 
proteolytic enzyme. Genetically engineered variants of the antibody can be produced by obtaining a 
polynudeot'de encoding the antibody, and applying the general methods of molecular biology to 
introduce mutations and translate the variant 
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Use in diagnosis 



Novel cDNA sequences corresponding to genes associated with cancer are potentially useful 
as diagnostic aids. Similarty, polypeptides encoded by such genes, and antibodies specific for these 

30 polypeptides, are also potentially useful as diagnostic aids. 

More specifically, gene duplication or overabundance of RNA In particular cells can help 
identify those cells as being cancerous, and thereby play a part in the initial diagnosis. Increased 
levels of RNA corresponding to CH1-9a11-2. CH8-2a13-12, CH13-2a12-1. and CH14-2a16-1 are 
present in a substantial proportion of breast cancer cell lines and primary breast tumors. In addition. 

35 preliminary Northem analysis using probes for CH8-2a13-12, CH13-2a12-1, and CH14-2a16-1 
indicates that these genes may be duplicated or be associated with RNA overabundance in certain 
cell lines derived from cancers other than breast cancer, including colon cancer, lung cancer, 
prostrate cancer, glioma, and ovarian cancer. 
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For patients already diagnosed with cancer, gene duplication or overabundance of RNA can 
assist with clinical management and prognosis. For example, overabundance of RNA may be a 
useful predictor of disease survival, metastasis, susceptibility to various regimens of standard 
chemotherapy, the stage of the cancer, or its aggressiveness. See generally the article by Blast U.S. 
5 Patent No. 4.968,603 (Slamon at al.) and PCT Applicatton WO 94/00601 (Levine et al.). All of these 
determinations are important in helping the clinician choose between the available treatment options. 

A particularly important diagnostic applicatton contemplated in this invention is the 
identification of patients suitable for gene-specific therapy, as outlined in the following section. For 
example, treatment directed against a particular gene or gene product is appropriate in cancers where 

10 the gene is duplicated or there is RNA overabundance. Given a particular pharmaceutical that is 
directed at a particular gene, a diagnostic test specific for the same gene is important in selecting 
patients likely to benefit from the pharmaceutical. Given a selection of such phanmaceuticals speaHc 
for different genes, diagnostic tests for each gene are important in selecting which pharmaceutical is 
likely to benefit a particular patient 

15 The polynucleotide, polypeptide, and antibodies embodied In this invention provide specific 

reagents that can be used in standard diagnostic procedures. The actual procedures for conducting 
diagnostic tests are extensively known in the art and are routine for a practitioner of ordinary skill. 
See. for example, U.S. Patent No. 4,968.603 (Slamon et al.). and PCT Applications WO 94/00601 
(Levine et al.) and WO 94/17414 (K Keyomarsi et al.). What foltows is a brief non-limiting survey of 

20 some of the known procedures that can be applied. 

Generally, to perform a diagnostic mettxxJ of this invention, one of the compositions of this 
invention is provided as a reagent to detect a target in a clink:al sample with whtoh It reacts. Thus, the 
polynucleotide of this invention can be used as a reagent to detect a DNA or RNA target such as 
might be present in a cell witti duplication or RNA overabundance of the corresponding gene. The 

25 polypeptide can be used as a reagent to detect a target for which it has a specific binding site, such as 
an antibody molecule or (if the polypeptide is a receptor) the corresponding ligand. The antibody can 
be used as a reagent to detect a target it specifically recognizes, such as the polypeptide used as an 
immunogen to raise it 

The target is supplied by obtaining a suitat>le tissue sample from an indivkluat for whom the 
30 diagnostic parameter is to be measured. Relevant test samples are ttiose obtained from indivkjuals 
suspected of containing cancerous cells, particularly breast cancer cells. Many types of samples are 
suitable for this purpose, including those thai are obtained near the suspected tunfx>r site by btopsy or 
surgical dissection, in vitro cultures of cells derived therefrom, btood. and blood components. If 
desired, the target may be partially purified from the sample or amplified before tiie assay is 
35 conducted. The reaction is performed by contacting the reagent witti tiie sample under conditions that 
will allow a complex to form between tiie reagent and the target The reaction may be performed in 
solution, or on a solid tissue sample, for example, using histology sections. The formation of tiie 
complex is detected by a number of techniques known In ttie art. For example, the reagent may be 
supplied witii a label and unreacted reagent may be renfx>ved from tiie complex; the amount of 
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remaining label thereby indicating the amount of complex formed. Further details and alternatives for 
complex detection are provided in the descriptions that follow. 

To detemiine whether the amount of complex fbmned is representative of cancerous or non- 
cancerous cells, the assay result is compared with a similar assay conducted on a control sample. It 
5 is generally preferable to use a control sample which is from a non-cancerous source, and otherwise 
sinrvlar in composition to the dinical sanopie being tested. However, any control sample may be 
suitable provided the relative anx>unt of target in the control is known or can be used for comparative 
purposes. Where the assay is being conducted on tissue sections, suitable control cells with normal 
histopathoiogy may surround the cancerous cells being tested. It is often preferable to conduct the 

1 0 assay on the test sample and the control sample simultaneously. However, if the anfK>unt of complex 
fomned is quantifiable and sufficiently consistent, it is acceptable to assay the test sample and control 
sample on different days or in different laboratories. 

A polynucleotide embodied in this invention can be used as a reagent for detenmining gene 
duplication or RNA overabundance that may be present in a clinical sample. The binding of the 

15 reagent polynucleotide to a target in a clinical sample generally relies in part on a hybridization 
reaction between a region of the polynucleotide reagent, and the ONA or RNA in a sample being 
tested. 

If desired, the nucleic acid may tie extracted from the sample, and may also be partially 
purified. To measure gene duplication, the preparation is preferably enriched for chronrx)somal DMA; 

20 to measure RNA overabundance, the preparation is preferably enriched for RNA. The target 
polynucleotide can be optionally subjected to any combination of additional treatments, including 
digestion with restriction endonudeases. size separation, for example by electrophoresis in agarose 
or potyacrylamide. and affixed to a reaction matrix, such as a blotting material. 

Hybridization is allowed to occur by mixing the reagent polynucleotide with a sample 

25 suspected of containing a target polynucleotide under appropriate reaction conditions. This may be 
followed by washing or separation to remove unreacted reagent. Generally, both the target 
polynudeotide and the reagent must be at least partly equilibrated into the single-stranded form in 
order for conriplementary sequences to hybridize effidently. Thus, it may be useful (particulariy in 
tests for DMA) to prepare the sample by standard denaturation techniques Icnown in the art 

30 The minimum complementarity between the reagent sequence and the target sequence for a 

complex to fom depends on the conditions under which the complex-fbmiing reaction is allowed to 
occur. Such conditions indude temperature, ionic strength, time of incutiation, the presence of 
additional solutes in the reaction nuxture such as formamide, and washing procedure. Higher 
stringency conditions are those under which higher mininmim complementarity is required for stable 

35 hybridization to occur. It is generally preferable in diagnostic applications to increase the specificity of 
the reaction, minimizing cross-reactivity of the reagent polynucleotide alternative urKlesired 
hybridization sites in the sample. Thus, it is preferable to conduct the reaction under conditions of 
high stringency: for example, in the presence of high temperature, low salt, formamide, a combination 
of these, or followed by a low-salt wash. 
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In order to detect the complexes formed between the reagent and the target, the reagent is 
generally provided with a label. Some of the labels often used in this type of a&say include 
radioisotopes such as and ^P, chemiluminescent or fluorescent reagents such as fluorescein, and 
enzymes such as alkaline phosphatase that are capable of producing a colored solute or precipitant 
5 The label may be intrinsic to the reagent, it may be attached by direct chemical linkage, or it may be 
connected through a series of intermediate reactive molecules, such as a biotin*avklin complex, or a 
series of inter-reactive polynucleotides. The label may be added to the reagent before hybridization 
with the target polynucleotide, or aften^ards. 

To improve the sensitivity of the assay, it is often desirable to increase the signal ensuing 
10 from hybridizatun. This can be accomplished by replicating either the target polynucleotide or the 
reagent polynucleotkie, such as by a polymerase chain reaction. Altematively. a combinatk)n of 
serially hybridizing polynucleotkJes or branched polynucleotides can be used in such a way that 
multiple label components become Incorporated into each complex. See U.S. Patent No. 5.124,246 
(Urdea et at.). 

15 An antibody embodied in this inventk)n can also be used as a reagent in cancer diagnosis, or 

for determining gene duplication or RNA overabundance that may be present in a cHnk:al sample. 
This relies on the fiact that overabundance of RNA in affected cells is often associated with increased 
production of the comesponding polypeptkie. Several of the genes up-regulated in cancer ceils 
encode for celt surface receptors A for example. e/t)B-2. o-myc and epkJermal growth fector. 

20 Altematively. the RNA may encode a protein kept inside the cell, or it may encode a protein secreted 
by the ceN into the surrounding milieu. 

Any such protein product can be detected in solid tissue samples and cultured cells by 
immunohlstotogical techniques that will be obvious to a practitioner of ordinary skill. Generally, the 
tissue is presen/ed by a combination of technkjues whk:h may rnckJde cooling, exchanging into 

25 different solvents, fixing with agents such as paraformaldehyde, or embedding in a commercially 
available medium such as paraffin or OCT. A sectbn of the sample is suitably prepared and overtaid 
with a primary antibody specific for the protein. 

The primary antibody may be provkled directly with a suitable label. More frequently, the 
prinriary antibody is detected using one of a number of devefoping reagents whkih are easily produced 

30 or available commercially. Typically, these developing reagents are anti-immunogfobulin or protein A. 
and they typically bear labels whk^h include, but are not fimited to: fluorescent markers such as 
fluorescein, enzymes such as peroxklase that are capable of precipitating a suitable chemical 
compound, electron dense markers such as collokial gold, or radfoisotopes such as ^^1. The section 
is then visualized using an appropriate microscopk: technique, and the level of labeling Is compared 

35 between the suspected cancer cell and a control cell, such as cells surrounding the tumor area or 
those taken from an alternative site. 

The amount of protein corresponding to the cancer-associated gene may be detected in a 
standard quantitative immunoassay, if the protein is secreted or shed from the ceil in any appreciatile 
amount, it may be detectable in plasma or serum samples. Altematively, the target protein may be 
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solubilized or extracted from a solid tissue sample. Before quantitating. the protein may optionally i>e 
affixed to a solid phase, such as by a blot technique or using a capture antibody. 

A number of immunoassay methods are estabfished in the art for perfonning the quantitation. 
For example, the protein may be mixed with a pre-determined non-limiting amount of the reagent 
5 antibody specific for the protein. The reagent antibody may contain a directly attached label, such as 
an enzyme or a radioisotope, or a second labeled reagent may be added, such as 
anti-immunoglobulin or protein A. For a solid-phase assay, unreacted reagents are removed by 
washing. For a liquid-phase assay, unreacted reagents are removed by some other separation 
technique, such as filtration or chromatography. The anrKMJnt of label captured in the complex is 

10 positively related to the amount of target protein present In the test sample. A variation of this 
technique is a competitive assay, in which the target protein competes with a labeled analog for 
binding sites on the specific antibody. In this case, the amount of label captured is negatively related 
to the amount of target protein present in a test sample. Results obtained using any such assay on a 
sample firom a suspected cancer-bearing source are compared with those from a non-cancerous 

15 source. 

A polypeptide embodied in this invention can also be used as a reagent in cancer diagnosis, 
or for determining gene duplication or RNA overabundance that may be present in a clinical sample. 
Overabundance of RNA in affected cells may result in the corresponding polypeptide being produced 
by the cells in an abnormal anrxxinL On occasion, overabundance of RNA may occur concun-ently 

20 with expression of the polypeptide in an unusual form. This in turn may result in stimulation of the 
immune response of the host to produce its own antibody molecules that are specific for the 
polypeptide. Thus, a number of human hybridomas have been raised from cancer patients that 
produce antibodies against their own tumor antigens. 

To use the polypeptide in the detection of such antibodies in a subject suspected of having 

25 cancer, an immunoassay is conducted. Suitable methods are generally the same as the 
immunoassays outlined in the preceding paragraphs, except that the polypeptide is provided as a 
reagent and the antibody is the target in the clinical sample which is to be quantified. For example, 
human IgG antibody molecules present in a serum sannple may be captured with solkJ-phase protein 
A. and then overlaid with the labeled poly|)eptide reagenL The amount of antibody would then be 

30 proportional to the label attached to the solid phase. Alternatively, cells or tissue sections expressing 
the polypeptide may be overtaid first with the test sample containing the antibody, and then with a 
detecting reagent such as labeled anti-immunoglobulin. The amount of antibody would then be 
proportional to the label attached to the cells. The amount of antibody detected in the sample from a 
suspected cancerous souroe would be compared with the amount detected in a control sample. 

35 These diagnostic procedures may be perfbmned by diagnostic laboratories, experimental 

laboratories, practitioners, or private individuals. This invention provides diagnostic kits which can be 
used in these settings. The presence of cancer cells in the individual may be manifest in a clinical 
sample obtained from that individual as an alteration in the DNA RNA. protein, or antibodies 
contained in the sample. An alteration in one of these components resulting from the presence of 
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cancer may take the form of an increase or decrease of the level of the component or an aiteration in 
the fomn of the component, compared with that in a sample from a healthy individual. The clinical 
sample is optionally pre-treated for enrichment of the target t)eing tested for. The user then applies a 
reagent contained in the kit in order to detect the changed level or alteration in the diagnostk: 
5 component 

Each kit necessarily comprises the reagent which renders the procedure specific: a reagent 
polynucleotide, used for detecting target DMA or RNA; a reagent antibody, used for detecting target 
protein; or a reagent polypeptide, used for detecting target antit^ody that may be present in a sample 
to be analyzed. The reagent is supplied in a soiki form or liquM buffer that is suitable for inventory 
10 storage, and later for exchange or additk)n into the reactton medium when the test is performed. 
Suitable packaging is provkied. The kit may optk>nalty provkle additk}nal components that are useful 
in the procedure. These optbnat components include buffers, capture reagents, developing reagents, 
labels, reacting surfaces, means for detectk>n, control samples, instructions, and interpretive 
information. 

15 

Use in phamacButical daveiopmant 



Embodied in this inventfon are modes of treating subjects bearing cancer cells that have 
overabundance of the particular RNA described. The strategy used to obtain the cDNAs provkied in 

20 this inventon was deliberately focused on genes that achieve RNA overabundance by gene 
duplicatfon in some cells, and by altemative mechanisms in otfier cells. These alternative 
mechanisms may include, for example, translocation or enhancement of transcription enhancing 
elements near the coding regfon of the gene, deletion of repressor binding sites, or altered productfon 
of gene regulators. Such mechanisms would result in more RNA being transcribed from the same 

25 gene. Alternatively, the same amount of RNA may be transcribed, but may persist longer in the cell, 
resulting in greater abundance. This could occur, for example, by reductfon In the level of ribozymes 
or protein enzymes that degrade RNA, or in the modificatfon of the RNA to render it more resistant to 
such enzymes or spontaneous degradation. 

Thus, different cells make use of at least two different mechanisms to achieve a single result 

30 A the overatKindance of a particular RNA. This suggests that RNA overabundance of these genes is 
central to the cancer process in the affected cells. Interfering with the specific gene or gene product 
would consequently nxxiify the cancer prcx:ess. It is an objective of this inventton to provide 
pharmaceutical compositions that enable therapy of this kind. 

One way this inventfon achieves this objective is through screening candidate drugs. The 

35 general screening strategy is to apply the candidate to a manifestation of a gene associated with 
cancer, and then determine whether the effect is benefk:ial and specific. For exampfe. a compositk)n 
that interferes with a pdynucleotkJe or polypeptkfe conesponding any of the novel cancer-associated 
genes described herein has the potential to bkx:k the associated pathotogy when administered to a 
tumor of the appropriate phenotype. It is not necessary that the mechanism of interference be known; 
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only that the interference be preferential for cancerous ceils (or cells near the cancer site) but not 
other cells. 

A preferred noethod of screening Is to provide cells In which a polynucleotide related to a 
cancer gene has been transfected. See, for example, PCT application WO 93/08701. A practitioner 
of ordinary skill will be well acquainted with techniques for transfecting eukaryotic cells, including the 
preparation of a suitable vector, such as a viral vector; conveying the vector into the cell, such as by 
eiectraporation: and selecting cells that have been transformed, such as by using a reporter or drug 
sensitivity element 

A cell line is chosen which has a phenotype desirable in testing, and which can be maintained 
well in culture. The cell line is transfected with a polynucleotide corresponding to one of the 
cancer-associated genes Identified herein. Transfection is performed such that tiie polynucleotide is 
operably linked to a genetic controiling element that permits tfie correct strand of the polynucleotide to 
be transcribed within the ceil. Successful transfection can be determined by the increased abundance 
of the RNA compared witti an untransfected cell. It is not necessary that ttie cell previously be devoid 
of the RNA, only tiiat the transfection result in a substantial increase in the level observed. RNA 
abundance in the cell is measured using the same polynucleotide, according to the hybridization 
assays outiined earlier 

Drug screening is perfbmried by adding each candidate to a sample of transfected cells, and 
monitoring the effect. The experiment includes a parallel sample which does not receive the 
candidate drug. The treated and untreated cells are then compared by any suitable phenotypic 
criteria, including but not limited to microscopic analysis, viability testing, ability to replicate, 
histological examination, the level of a particular RNA or polypeptide associated witii the cells, the 
level of enzymatic activity expressed by ttie cells or cell lysates, and the ability of the cells to interact 
with other cells or compounds. Differences between treated and untreated cells indicates effects 
attributable to ttie candklate. In a preferred method, the effect of ttie drug on ttie cell ti^nsfected witti 
ttie polynucleotide is also compared wttti ttie effect on a conttot cell. Suitable control cells inchjde 
untifansfected cells of similar ancestty, cells transfected witti an alternative polynucleotide, or cells 
ttansfected witti the same polynucleotide in an inoperative feshton. Optimally, ttie drug has a greater 
effect on operably transfected cells than on control cells. 

Desirable effects of a candidate drug include an effect on any phenotype ttiat was confened 
by transfection of ttie cell line witti ttie polynucleotide from ttie cancer-associated gene, or an effect 
ttiat could limit a pattiological feature of the gene in a cancerous cell. Examples of ttie first type would 
be a drug ttiat limits ttie overabundance of RNA in ttie transfected cell, limits production of ttie 
encoded protein, or limits ttie functional effect of ttie protein. The effect of ttie drug woidd be apparent 
when comparing results between tt-eated and untteated cells. An example of the second type would 
be a drug that makes use of the transfected gene or a gene product to spedfically poison ttie cell. 
The effect of ttie drug would be apparent when comparing results between operably transfected cells 
and control cells. 
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Us0 In tTBatment 

This invention also provides gene-specific pharmaceuticals in which each of the 
polynucleotides, polypeptides, and antibodies emtxxlied herein as a specific active ingredient in 
5 pharmaceuticai compositions. Such compositions may decrease the pathology of cancer ceils on 
their own, or render the cancer cells more susceptible to treatment by the non-specific agents, such 
as classical chemotherapy or radiation. 

An example of how polynucleotides embodied in this invention can be effectively used in 
treatment is gene therapy. See, for example. Morgan et at., Cuiver et al., and U.S. Patent No. 

10 5.399,346 (French et al.). The general principle is to introduce the polynucleotide into a cancer cell in 
a patient, and allow it to Interfere with the expression of the corresponding gene, such as by 
complexing with the gene itself or with the RNA transcribed from the gene. Entry into the ceil is 
facilitated by suitable techniques known in the art as providing the polynucleotide in the form of a 
suitable vector, or encapsulation of the polynucleotide in a liposome. The polynucleotide may t>e 

1 5 provided to the cancer site by an antigen-specific homing mechanism, or by direct injection. 

A preferred mode of gene therapy is to provide the polynucleotide in such a way that it will 
replicate inside the cell, enhancing and prolonging the interference effect. Thus, the polynucleotide is 
operably linked to a suitable pronrK>ter. such as the natural promoter of the corresponding gene, a 
heterologous promoter that is tntrinstcally active in cancer cells, or a heterologous promoter that can 

20 be Induced by a suitable agent Preferably, the construct is designed so that the polynucleotide 
sequence operably linked to the promoter is complementary to the sequence of the corresponding 
gene. Thus, once integrated into the cellular genome, the transcript of the administered 
polynucleotide will be complementary to the transcript of the gene, and capable of hyt>ridizing with It 
This approach is known as antnsense therapy. See. for example. Culver et al. and Roth. 

25 The use of antibodies embodied in this invention in the treatment of cancer partly relies on the 

fact that genes that show RNA overabundance in cancer frequently encode cell-surfece proteins. 
Location of these proteins at the cell surface may correspond to an important biological function of the 
cancer ceil, such as their interaction with other cells, the modulation of other cell-surface proteins, or 
triggering by an incoming cytokine. 

30 These mechanisms suggest a variety of ways in whx;h a specific antibody may be effective in 

decreasing the pathology of a cancer cell. For example, if the gene encodes for a growth receptor, 
then an antibody that bkx:ks the Itgand binding site or causes endocytosis of the receptor would 
decrease the ability of the receptor to provide its signal to the cell. It is unnecessary to have 
knowledge of the mechanism befbrehand; the effectiveness of a particular antibody can be predicted 

35 empirically by testing wltti cultured cancer cells expressing the corresponding protein. Monoclonal 
antibodies may be more effective in this form of cancer therapy if several different clones directed at 
different determinants of the same cancer-associate gene product are used in combination: see PCT 
application WO 94/00136 (Kasprzyk et al.). Such antibody treatment may directly decrease the 
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pathology of the cancer cells, or render them more susceptible to non-specific cytotoxic agents such 
as platinum (Lippman). 

Another example of how antik)odies can be used in cancer therapy is in the specific targeting 
of effector components. The protein product of the cancer-associated gene is expected to appear in 
high frequency on cancer cells compared to unaffected cells, due to the overabundance of the 
corresponding RNA. The protein therefore provides a marker for cancer cells that a specific antibody 
can bind to. An effector component attached to the antibody therefore becomes concentrated near 
the cancer cells, improving the effect on those cells and decreasing the effect on non-cancer cells. 
This concentration would generally occur not only near the primary tumor, but also near cancer cells 
that have metastasized to other tissue sites. Furthenmore. if the antibody is able to induce 
endocy tosis, this will enhance entry of the effector into the cell interior. 

For the purpose of targeting, an antibody specific for the protein of the cancer-associated 
gene is conjugated with a suitable effector component, preferably by a covalent or high-affinity bond. 
Suitable effector components tn such compositions include radionuclides such as ^\ toxic chemicals 
such as vincristine, and toxic peptides such as diphtheria toxin. Other suitable effector components 
include peptides or polynudeotidescapable of altering the phenotype of the cell in a desirable fashion: 
for example, installing a tunmr suppresser gene, or rendering them susceptible to immune attack. 

In most applications of antibody molecules in human therapy, it is preferable to use human 
monoclonals. or antibodies that have been humanized by techniques known in the art This helps 
prevent the antibody molecules themselves from becoming a target of the host* s immune system. 

An example of how polypeptides embodied in this invention can be effectively used in 
treatment is through vaccination. The growth of cancer cells Is naturally limited in part due to immune 
surveillance. This refers to the recognition of cancer cells by Immune recognition units, particularly 
antibodies and T cells, and the consequent triggering of immune effector functions that limit tumor 
progression. Stimulation of the Immune system using a particular tumor-specific antigen enhances 
the effect towards the tumor expressing the antigen. Thus, an active vacdne comprising a 
polypeptide encoded by the cDNA of this invention would be appropriately administered to subjects 
having overabundance of the con^sponding RNA. There may also be a prophylactk: role for the 
vaccine in a population predisposed for developing cancer cells with overabundance of the same 
RNA. 

Ways of increasing the effectiveness of cancer vaccines are known in the art (Beardsley, 
Maclean et al.). For example, synthetic antigens are conjugated to a carrier like keyhole limpet 
hemocyanin (KLH). and then combined "with an adjuvant such as DETOX^, a mixture of 
mycobacterial cell walls and lipid A. Any polypeptide encoded by the four novel genes described in 
this invention can be used in anatogous compositions. 

Methods for preparing and administering polypeptide vaccines are known in the art Peptides 
may be capable of eliciting an immune response on their own. or they may be rendered more 
immunogenic by chemical manipulation, such as cross-linking or attaching to a protein carrier like 
KLH. Preferably, the vaccine also comprises an adjuvant, such as alum, muramyl dipeptides, 



wo 97/38085 



PCT/US97/05930 



liposomes, or DETOX^^. The vaccine may optionally comprise auxiliary substances such as wetting 
agents, emulsifying agents, and organic or inorganic salts or acids,. It also comprises a 
pharniaceutically acceptable exciptent which is compatible with the active ingredient and appropriate 
for the route of administration. The desired dose for peptide vaccines is generally from 1 0 ^ to 1 mg. 
5 with a broad effective latitude. The vaccine is preferably administered first as a priming dose, and 
then again as a boosting dose, usually at least four weeks later. Further boosting doses may be given 
to enhance the effect. The dose and its timing are usually determined by the person responsible for 
the treatment. 

1 0 Sequence data and deposits 

The foregoing detailed description provides, inter alia, a detailed explanation of how genes 
associated with cancer can be identified and their cDNA obtained. Polynucleotide sequences for 
CH1-9a1 1-2. CH8-2a13-1. CH13-2a12*1. and CH14-2a16-1 are provided. 

15 The sequence data listed in this application was obtained by two-directional sequencing, 

except where indicated othen/vise. The data are t)elieved to be accurate — nevertheless, it is readily 
appredated that the techniques of the art as used herein have the potential of introducing occasional 
and infrequent sequence errors. Clones and inserts obtained via PGR may also comprise occasional 
errors introduced during amplification. Nucleotide sequences predicted from database compilations. 

20 and sequence data obtained by one-directional sequencing may also contain occasional errors in 
accordance with the limitations of the underlying techniques. In addition, allelic variations to both 
nucleotide and amino add sequences may occur naturally or be deliberately induced. Differences of 
any of these types t}etween the sequences provided herein and the invention as practiced may t>e 
present without departing from the spirit of the invention. 

25 Sequence data for CH&-2a13-1 and CH13-2a12-1 cONA are believed to comprise the entire 

translated coding sequence, and 5' and 3' unb^nslated regions corresponding to those found in 
typical mRNA transcripts. Multiple mRNA transcripts may be found depending on the patterns of 
transcript processing in various cell types of interest. Sequence data for CH1-9a11-2 and 
CH14-2a16-1 cDNA comprise a portion of the coding sequence and 3' untranslated regions. 

30 Additional sequence Is typically present in the corresponding mRNA transcripts, comprising an 
additional coding region in the N-terminal direction of the protein, and possibly a 5' untranslated 
region. 

Certain embodiments of this invention may be practiced by polynucleotide syntiiesis 
according to the data provided herein, by rescuing an appropriate insert corresponding to the gene of 
35 interest from one of the deposits listed below, or by isolating a corresponding poly nudeotide from a 
suitable tissue source. Various useful probes and primers for use in potynudeotide Isolation are 
provided herein, or may be designed from the sequence data. 
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Three deposits have been made on May 31 , 1996 with the American Type Culture Collection 
(ATCC). 12301 Parklawn Drive, Rockvilie. Maryland 20852 under ternDS of the Budapest treaty. The 
deposits are outlined in Table 2: 



TABLE 2: ATCC Deposits 



BC6F1 

Accession No. 

90U/4 


Mixture of £. ao//with recombinant plasmids of cDNA fragments of genes 
associated with breast cancer. The 8 recomtHnant plasmids may be separated 
by plating on Ampicillin plates and selecting single colonies for analysis by PGR 
using SP6 and T7 primers. 




Gene 


Subclone 


Expected size of PGR product 




CH1-9a11-2 


pchl-1.1 


1.1 kb 






pch1-2.5 


2.5 kb 




CH8-2a13-1 


pch8-600 


600 bp 






pch8-3k 


3.0 kb 






pchd-4k 


4.0 kb j 




CH14-2a16-1 


pchi 4-800 


800 kb 






pch14-1.6 


1.6 kb 






pchi 4-1 .3 


1.3 kb 


:BCGF2 

Accession No. 
97595 


Mixture of Xgtl 0 recombinant phages with cDNA inserts of genes associated 
with breast cancer. The 2 phages may be separated by growing in the £, coif 
host (strain NM514) and plating out for single plaques. These plaques can be 
distinguished by PGR using Xgtl 0 reverse and forward primers. 




Gene 


Phage 


Expected size of PGR product 




CH13-2a12-1 


^ Xchl3-3.5 


3.5 kb 




CH14-2a16-1 


Xch14-2.5 


2.5 kb 


XBCBT474 

Accession No. 
97594 


cDNA library derived from breast cancer cell line BT474 in Xgtl 0 vector, 
supplemented with a cDNA library from breast cancer cell line 600PE in Xgtl 0 
vector. The cDNA insert sizes range from about 0.5 to 5 Kb. 
XBCBT474 is a source of additional cONA inserts corresponding to 
CH1-9a1 1-2. CH8-2a13-1 . CH13-2a12.1 ,or CH14-2a16-1 not present in 
BCGF-1 or BCGF-2. 



10 



Sequence databases contain sequences of polynucleotide and polypeptide fragments with 
varyous degrees of identity and overlap with certain embodiments of this invention. The following list 
of accessk>n numbers is provided for the interest of the reader, it Is not intended to be comprehensive 
or a limitation on the invention. The database disclosures do not typically indteate use in cancer 
diagnosis, drug development, or disease treatment 

The following GenBank accession numbers are listed in relation to CHI -9a 11-2: dbEST <> 
N32686; N45113; N36176; N22982; AA278630; H88670; AA235936; AA236951; H26301; N26026; 
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H88063; H88064; D61948; H88718; H26460; AA137920; AA145308; W12952; AA200687; N44164; 
T27279; dbSTS G22044; G04961 . 

The following GenBank accession numbers are listed in relatbn to CH8-2a13-1: dbNR 

083780 

5 The following GenBank accession numbers are listed in relation to CH13-2a12-1: dbNR 

U58090; dbEST AA182441; AA253924: AA179765; AA112715; AA112640: V\«7977: AA150317; 
W68080; AA150243; AA100446; VVB9636; H46574: AA245889; AA100651; H77368; AA192778; 
T85671; N32682; T86257: T78239; T77874: AA187866; Z33557; R40816; N99802; R19302; 
AA100650; N55904; AA257161; H77369: T79014. 
10 The following GenBank accession numbers are listed in relation to CH14-2a16-1: dbEST 

N64802; W56903; N31400: W95674; AA233561; AA233636; N24105; W03447; W25821; AA233666; 
AA233647; N67843; D55778; T66839; N55370; N75650; AA280736: H97110; 219643; H91250; 
AA230765; R93089; T84665; VV94857: R92873 

15 The examples presented below are provided as a further guide to a practitioner of ordinary 

skill in the art. and are not meant to be limiting in any way. 

Examples 

20 Example 1: SelectingcDNA formessengerRNA that Is overabundantin breast cancar calls 

Total RNA was isolated from each breast cancer cell line or control cell by centhfugation 

through a gradient of guanidine isothiocyanate/CsCI. The RNA was treated with RNase-free DNase 

(Promega. Madison, Wl). After extraction with phenotehlorofbnm, the RNA preparations were stored 
25 at -70^C. Oligo-dT polynucleotides for priming at the 3' end of messenger RNA with the sequence 

T^tNM (where N e {A,C,G} and M e {A,C.G.T}) were synthesteed according to standard protocols. 

Arbitrary decamer polynucleotides (OPA01 to OPA20) for priming towards the 5' end were purchased 

from Operon Biotechnok>gy, Inc., Alameda, CA. 

The RNA was reverse-transcribed using AMV reverse transcriptase (obtained from BRL) and 
30 an anchored ollgo-dT primer in a volume of 20 |iL. according to the manufacturer's directkxis. The 

reaction was incut>ated at 370C for 60 min and stopped by incut>ating at 950C for 5 min. The cDNA 

obtained was used Immediately or stored frozen at -70^C. 

Differential display was conducted according to the following procedure: 1 cDNA was 

replicated in a total volume of 1 0 PCR mixture containing the appropriate T^ ^NM sequence, 0.5 TM 
35 of a decamer primer. 200 TM dNTP. 5 TCi (^S]-dATP (Amersham), Taq polymerase buffer with 2.5 

mM MgCI} and 0.3 unit Taq polymerase (Promega). Forty cycles were conducted in the following 

sequence: 94^C for 30 sec, 40°C for 2 min. 72°C for 30 sec; and then the sample was incubated at 
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72°C for 5 min. The replicated cONA was separated on a 6% potyacrylamide sequencing gel. After 
electrophoresis, the gel was dried and exposed to X-ray fllnr). 

The autoradiogram was analyzed for lat>eled cONA that was present in larger relative amount 
in all of the lanes corresponding to breast cancer cells, compared with all of the lanes corresponding 
to control celts. Figure 1 provides an example of an autoradiogram from such an experiment Lane 
1 is from non-proliferating normal breast cells; lane 2 is from proliferating normal breast cells; lanes 
3 to 5 are from breast cancer cell lines BT474. SKBR3, and MCF7. The left and right side shows 
the pattern obtained from experiments using the same Tt^NM sequence (T^iAC), but two different 
decamer primers. The arrows indicate the cDNA fragments that were more abundant in all three 
tunnor lines compared with controls. 

The assay illustrated in Figure 1 was conducted using different combinations of oligo-dT 
primers and decamer primers. A numt)er of differentially expressed bands were detected when 
different primer combinations were used. However, not all differences seen initially were 
reproducible after re-screening. We therefore routinely repeated each differential display for each 
primer combination. Only bands showing RNA overabundance In at least 2 experiments were 
selected for further analysis. 

It is preferable to include in the differential display experiment RNA derived from uncultured 
normal mammary epithelial cells (termed "organoids"). These cells are obtained from surgical 
samples resected from healthy breast tissue, which are then coaxed apart by blunt dissection 
techniques and mild enzyme treatment. Using organoids as the negative control, 33 cDNA 
fragments were isolated from 15 displays. 

ExampiQ 2: Sub-S9iecting cDNA that corresponds to genes that are duplicated In breast 
cancer cells 

cDNA fragments that were differentially expressed in the fashion described in Example 1 
were excised from the dried gel and extracted by boiling at 950G for 10 min. Eluted cDNA was 
recovered by ethanol precipitation, and replicated by PGR. The product was cloned into the pCRll 
vector using the TA cloning system (Invitrog^n). 

EcoRi digested placenta DNA. and EcoRI digested DNA from the breast cancer cell lines 
BT474, SKBR3 and ZR-75-30 were used to prepare Southern blots to screen the cloned cDNA 
fragments. The cloned cDNA fragments were labeled with (32P)-dCTP, and used individually to probe 
the blots. A larger relative amount of binding of the probe to the lanes corresponding to the cancer 
cell DNA Indicated that the corresponding gene had been duplicated in the cancer cells. The labeled 
cDNA probes were also used in Northern blots to verify that the corresponding RNA was 
overatnjndantin the appropriatecell lines. 

To determine whether the cONA fragments obtained by this selection procedure 
con^sponded to novel genes, a partial nucleotide sequence was obtained using M13 primers. 
Each sequence was compared with the known sequences in GenBank. In Initial experiments. 5 of 
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the first 7 genes sequenced were nnitochondrial genes. To avoid repeated isolation of 
mitochondrial genes, subsequent screening experiments were done with additional lanes in the 
DNA blot analysis for EcoRI digested and Hind\\\ digested mitochondrial DNA. Any cDNA fragment 
that hybridized to the appropriate mitochondrial restriction fragments was suspected of 
5 con^esponding to a mitochondrial gene, and not analyzed further. 

From the 33 cDNA fragments detected from differential displays using organoid mRNA, 12 
were subcioned. Of these 12, 6 detected suitable gene duplications in the appropriate cell lines. 
Three cDNA failed to detect duplicated genes, and 3 appeared to correspond to mitochondrial 
genes. Sequence analysis of the 6 suitable cONA fragments showed no identity to any known 
10 genes. 

To obtain longer cONA corresponding to the cONA fragments with novel sequences, the 
firagments were used as probes to screen a cDNA library from breast cancer cell line BT474, 
constructed in lambda GT10. The longer cONA obtained from lambda GT10 were sequenced 
using lambda GT10 primers. The chromosomal locations of the cDNAs were determined using 
1 5 panels of somatic cell hybrids. 

Four of the 6 novel cDNA identified so far have been processed in this fashion. The 
probes used to obtain the 4 new breast cancer genes are shown in Table 3. 



■ 

1 TABLE 3: Primers used for Differential Display 


1 cDNA 


OIlgo-dTprimer 


Arbitrary primer 


CH1-9a11-2 


Tilde (SEQIDN0:9) 


SEQ ID N0:11 


CH8-2a13-1 


T„AC (SEQIDNO:10) 


SEQ ID NO: 12 


CH13-2a12-1 


Tt,AC (SEQIDNO:10) 


SEQ ID N0:13 


CH14-2a16-1 


TiiAC (SEQIDNO:10) 


SEQ ID N0:14 



20 

Example 3: Using the cDNA to test panels of breast cancer cells 

To determine the proportion of breast cancers in which the putative breast cancer genes 
were duplicated, or showed RNA overabundance without gene duplication, the four cDNA obtained 
25 according to the selection procedures described were used to probe a panel of breast cancer cell 
lines and primary tumors. 

Gene duplication was detected either by Southem analysis or slot-blot analysis. For 
Southern analysis. 10 jig of EcoRl digested genomic DNA from different cell lines was 
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electrophoresedon 0.8% agarose and transferred to a HYBOND™ N+ mennbrane (Amersham). The 
filters were hybridized with 32P-labeled cDNA for the putative breast cancer gene. After an 
autoradiogram was obtained, the probe was stripped and the blot was re-prot}ed using a reference 
probe to adjust for differences in sample loading. Either chromosome 2 probe D2S5 or chromosome 
5 21 probe D21S6 was used as a reference. Densities of the signals on the autoradiograms were 
obtained using a densitometer (Molecular Dynamics). The density ratio between the breast cancer 
gene and the reference gene was calculated for each sample. Two samples of placental ONA digests 
were run in each Southern analysis as a control. 

For slot-biot analysis. 1 ^g of genomic DNA was denatured and slotted on the HYBOND^*' 

10 membrane. D21S5 or human repetitive sequences were used as reference probes for slot blots. The 
density ratio t>etween the breast cancer gene and the reference gene was calculated for each sample. 
10-15 samples of placental DNA digests were used as control. Amongst the control samples, the 
highest density ratio was set at 1.0. The density ratio of the tumor cell lines were standardized 
accordingly. An arbitrary cut-off for the standardized ratio (typically 1.3) was defined to identify 

15 samples in which the putative gene had been duplicated. Each of the cell lines in the breast cancer 
panel was scored positively or negatively for duplication of the gene being tested. 

Some of the cell lines in the panel were known to have duplicated chromosomal regions from 
comparative genomic hybridization analysis. In instances where the cDNA being used as probe 
mapped to the Known amplified region, the cDNA indicated that the corresponding gene had also 

20 been duplicated. However, duplicated genes were also detected using each of the four cDNAs In 
instances where comparative genomic hybridization had not revealed any amplification. 

Because of the nature of the technique, the standardized ratio calculated as described 
underestimates the gene copy numt)er. although it is expected to rank in the same order. For 
example, the standardized ratk> obtained for the c-myc gene in the SKBR3 breast cancer cell viras 5.0. 

25 However, it is known that SKBR3 has approximately 50 copies of the c-myc gene. 

To test for overabundance of RNA, 10 fig of total RNA from breast cancer cell lines or primary 
breast cancer tumors were electrophoresed on 0.8% agarose in the presence of the denaturant 
formamide. and then transferred to a nylon membrane. The membrane was probed first with 
32P-labeled cDNA corresponding to the putative breast cancer gene, then stripped and reprobed with 

30 32P-labeled cDNA for the beta-actin gene to adjust for differences in sample loading. Ratios of 
densities t»etween the candidate gene and the beta-actin gene were calculated. RNA from three 
different cultured normal epithelial cells were included in the analysis as a control for the normal level 
of gene expression. The highest ratio obtained from the normal cell samples was set at 1 .0. and the 
ratios in tiie various tumor cells were standan:iized accordingly. 
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Example 4: Chromosome 1 gene CHI-Bal 1-2 

One of the cDNA obtained through the selection procedures of Examples 1 and 2 
5 corresponded to a gene that mapped to Chromosome 1 . 

Table 4 summarizes the results of the analysis for gene duplication and RN A overabundance. 
Both quantitative and qualitative assessment is shown. The numt>ers shown were obtained by 
comparing the autoradiograph intensity of the hybridizing band in each sample with that of the 
controls. Several control samples were used for the gene duplication experiments, consisting of 
10 different preparations of placental ONA. The control sample with the highest ievel of intensity was 
used for standardizing the other values. Other sources used for this analysis were tneast cancer cell 
lines with the designations shown. For reasons stated in Example 3. the quantitative number is not a 
direct indication of the gene copy number, although it is expected to rank In the same order. Similarly, 
up to 6 control samples were used for the RNA overabundance experiments, consisting of different 
15 preparations of breast cell organoids which had been maintained briefly in tissue culture until the 
experiment was performed. The control sample with the highest level of intensity was used for 
standardizing the other values. Each cell line was scored or - according to an arbitrary cut-off value. 
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Gene duplication or RNA overabundance; - no duplication or overabundance; nd = not done 

* Degree of gene duplication is reported relative to placental DNApreparattm. 

** Degree of RNA overabundance is reported relative to the highest level observed for 

several cultures of nomial epiltielial oelb. Two hybridizing species of RNA 

am calculated and reported separately. 



The gene corresponding to the CHi-9a1 1-2 cDNA was duplicated in 9 out of 15 (60%) of the 
breast cancer cell lines tested, compared with placental DNA digests (P3 and P12). The sequence of 
the 1 15 bases from the 5' end of the cONA fragnrient (SEQ. ID NO:1) is shown In Figure 22. There 
was no substantial homology to any known gene in GenBank. One of the three possible reading 
frames was found to be open, with the predicted amino add shown In Figure 22 (SEQ. ID NO:2). 
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The CH1-9a11-2 gene was further characterized by obtaining additional sequence 
information. A X-GT10 cDNA iibrary from the breast cancer cell line BT474 (Example 2) was 
screened using the initial cONA Insert, and a clone with a 2.5 kilobase insert was identified. The 
identified clone was subcloned Into plasmid vector pCRtl. T7 and Sp6 primers for regions flanking the 
5 cONA inserts were used as Initial sequencing primers: 

T7 primer (SEQ. ID NO:42) 

5*-TAATACGACTCACTATAGGGAGA-3' 
Sp6 primer (SEQ. ID NO:43) 
1 0 5-CATACGATrTAGGTGACAGTATAG-3' 



Sequencing continued by walking atong the region of interest by standard techniques, using 
sequencing primers based on data already obtained. Primers used in sequencing are designated 1- 
16 in Figure 7. 

15 A second clone (designated pCH1-11) overiapping on the 5* end was obtained using 

CLONTECH Marathon^ cDNA Amplifk:ation Kit A map showing the overlapping regk>ns is provkied 
in Figure 6. Briefly, two DNA primers designated CHIa and CHIb (Figure 7) were synthesized. 
Polyadenytated RNA from breast cancer cell line 600PE was reverse transcribed using CH1b primer 
After second strand synthesis, adaptor DNA provided in the kit was ligated to the double-stranded 

20 cDNA The 5' end cDNA of CH1-9a11-2 was then amplified by PGR using primers CHIa and API 
(provided in the kit). To increase the specificity of the PGR products, the first PGR products were 
PGR reampiilied using nested primers GHIa and AP2 (provided in the kit). The PGR products were 
doned into pGRII vector (Invitrogen) and screened with GH1-9a11-2 probe. 

The sequence of 3452 base pairs between the 5' end of pCH1-1 .1 and the poly-A tail of GH1- 

25 9a1 1-2 was determined by standard sequencing techniques. The DfsIA sequence is shown in Figure 
8 (SEQ. ID NO: 15). The longest open reading frante is In frame 1 (bases 1-1875), and codes for 624 
anrtino ackis before the stop codon. The corresponding amino acid sequence of this frame is shown 
in the upper panel of Figure 9 (SEQ. ID NO: 16). The partial sequence predcted for the translated 
protein is listed the tow panel of Figure 9 (SEQ. ID NO:17). Bases 1876 to the end of the sequence 

30 are believed to be a 3' untranslated regbn. A hydrophobrcity analysis klenttfied a putative membrane 
Insertton or membrane spanning region at aix>ut amino ackJs 382-400, indteated in Figure 9 by 
underiining. 

Figure 23 is a listing of additional cDNA sequence ot>tained for GH1-9d11-2, comprising 
approximately 1934 base pairs 5* from the sequence of Rgure 8. The additkxial sequence data was 
35 obtained by rescuing and amplifying two further fragments of CH1-9a11-2 cONA. Nested primers 
were designed -100 base pairs downstream fiom the 5' end of the known sequence. The primers 
were used In a nested ampliflcatron assay using API and AP2, using the CLONTECH Marathon^ 
cDNA Amplification Kit as described above. The template for the first upstream fragment was 
reverse-transcribed polyadenytated RNA from breast cancer cell line 600PE . as described eariier 
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This fragment was sequenced, and another set of nested primers was designed. The template for the 
next upstream fragment was a Marathon^ ready cDNA preparation from human testes, also supplied 
by CLONTECH. 

The nucleotide sequence shown In Figure 23 comprises an open reading frame through to 
5 the 5' end. Figure 24 shows the corresponding protein translation. Between aboui another 500-1000 
bases are predicted to be present in the CH1-9a11-2 direction, with the protein encoding sequence 
beginning somewhere within this additional sequence. Sequencing of the encoding region is 
completed by obtaining additional CH1-9a1 1-2 fragments in this direction. 

A GENINFO® BLAST search of nucleotide and peptide sequence databases was performed 
10 through the National Center for Biotechnology Information on February 23, 1996. Short segments of 
honnology with other reported human sequences were found at the nucleotide level (<500 base pairs), 
but none with any ascribed function in the respective identifier. At the amino acid level, no identity 
higher than 30% was found with any reported eukaryotic sequences. 

A CH1-9a11*2 cloned insert has been used to probe the level of relative expression in 
15 polyadenylated RNA from a panel of tissue sources. The RNA was obtained already prepared for 
Northern blot analysis (CLONTECH Catalog # 7759-1. 7760-1 and 7756-1.) The manufacturer 
produced the blots from approximately 2 (ig of poly-A RNA per lane, run on a denaturing 
formaldehyde 1-2% agarose gel. transfened to a nylon membrane, and fixed by UV irradiation. The 
relative CH1-9a1 1-2 expression observed at the RNA level is shown in Table 5: 

20 



r 
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1 brain 
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placenta 


++ 


lung 




liver 


+/- 1 
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kidney 
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thymus 




prostate 


1 ^ 


testis 


+++ 


ovary 


++ 


small intestine 


+ 


cokxi 


+/- 


peripheral blood 


+/- 





Relatively elevated levels of expression were ot>served in heart placenta, pancreas, prostate, testis 
and ovary. The level of expression in breast cancer cell lines is also relatively high (about +^'*"*- on 
the scale), since the Northern analysis performed on these lines (described above) was conducted on 

5 total cellular RNA, of whk:h polyadenylated RNA constitutes only about 5%. It is likely that the CHI- 
9a11-2 gene is involved in a blok)gical process that is typical to the tissue types showing medium to 
high levels of expression, which may relate to increased tissue growth or metabolism. 

Since the obtained sequence is shorter than the apparent size of mRNA observed in 
Northem analysts (Table 1), an additional polynucleotide segment is believed to be present at the 5' 

10 end of the sequence shown in SEQ. ID N0:15. Further sequence data at the 5' end is deduced by 
obtaining additk)nal cloned cDNA using standard techniques. Briefly. In one approach* mRNA from 
breast cancer cell lines MDA-453 and/or 600PE are cloned and screened using primers based on 
sequence data from SEQ. ID NO: 15. Two nested primers of about 20 nucleotkles are prepared, the 
innermost about 150 base pairs from the 5' end, and the outermost about 170 base pairs from the 5' 

15 end. The outennost primer is used to synthesize a first cDNA strand complementary to the mRNA in 
the upstream direction. Second strand synthesis is performed using reagents in a CLONTECH 
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Marathon^ cONA amplification l<it according to manufacturer's directions. The doubie-stranded ONA 
is tlien ligated at the 5* end of the coding sequence with ttie double-stranded adaptor fragnnent 
provided in the kit. A first PGR amplification (about 30 cycles) is performed using the first adapter 
primer from the kit and the outennost RNA-specific primer, and a second ampiification (about 30 

5 cycles) is performed using the second adapter primer and the innenrost RNA-specific primer. In an 
altemative approach, a CLONTECH RACE-READY single-stranded cDNA from human placenta is 
PGR amplified using nested 5' anchor primers in combination with the outermost and innermost RNA- 
speclfk: primers. Amplified DNA obtained using either approach is analyzed by gel electrophoresis, 
and doned into plasnrvd vector pCRIL Qones are screened, as necessary, using the 2.5 kibbase 

10 CH1-9a11-2 insert. Ctones corresponding to full-length mRNA (4.5 kb or 5.5 kb; Table 1), or cDNA 
fragments overiapping at the 5' end are selected for sequencing. Gompared with the 4.5 kb fom, 
additional polynucleotide segments may be present in the 5.5 kb fbmi within the encoding region, or in 
the 5' or 3* untranslated regbn. 

15 Examples: ChromosomB8gen9CH8'2a13'1 

One of the cDNA obtained corresponded to a gene that mapped to Chromosome 8. Figure 2 
shows the Southern blot analysis for the con^ponding gene in various ONA digests. Lane 1 (PI 2) is 
the control preparation of placental DNA; the rest show DNA obtained from human breast cancer cell 
20 lines. Panel A shows the pattern obtained using the 32P-labeled CH8-2a13-1 cDNA probe. Panel B 
shows the pattern obtained with the same bk)t using the 32P-iabeled D2S6 probe as a toading control. 
The sizes of the restrictk^n fragments are indicated on the right 

Figure 3 shows the Northern bk>t analysis for RNA overabundance, l^es 1-3 show the level 
of expression in cultured nomrial epithelial cells. Lanes 4-19 show the level of expresston in human 
25 breast cancer cell lines. Panel A shows the pattem obtained using the CH8-2a13-1 probe; panel B 
shows the pattem obtained with beta-actin cbNA, a k)ading control. 

The results are summarized in Tat>le 6. The scoring method is the same as for Example 4. 
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* Gene duplication or RNA overabundance; • no duplication or overabundanoe; nd » not done. 

* Degree of gene duplication is reported relative to placental ONA preparations. 

Degree of RNA overabundance is reported relative to the h^hest level observed for several cuXures of 
5 normal epithefialoeHs. 

The gene corresponding to CH8-2a1S-1 showed dear evidence of duplication in 12 out of 17 
(71%) of the ceils tested. RNA overabundance was observed in 14 out of 17 (82%). Thus, 11% of 
the cells had achieved RNA overabundance by a mechanism other than gene duplication. 
10 Since the known oncogene o-myc Is located on Chromosome 6, the Southern analysis was 

also conducted using a probe for c-myc. At least 2 of the breast cancer cells showing duplication of 
the gene corresponding to CH8-2a13-1 gene did not show duplication of c-myc. This indicates that 
the gene corresponding to CH8-2a13-1 is not part of the myc ampiicon. 

The sequence of 150 bases from the 5' end of the cDNA fragment is shown in Figure 22 
15 (SEQ ID NO:3). There was no substantial homology to any known gene in GenBank. One of the 
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three possible reading frames was found to be open, with the amino acid sequence shown In Figure 
22 (SEQ ID N0:4). 

The CH8-2a13'1 gene was fuither characterized by obtaining additional sequence 
infonnation. A X.-GT10 cDNA library from the breast cancer cell line 6T474 (Example 2) was 
5 screened using the initial cONA insert and clones with a 3.0 kb and a 4.0 kb insert were identified. 
The two identified ctones were subcloned into plasmid vector pCRII. T7 and Sp6 primers for regions 
flanking the cDNA inserts were used as initial sequencing primers. Sequencing continued by walking 
abng the regk>n of interest by standard technques. using sequencing primers based on data already 
obtained. The two inserts were found to overiap (Figure 6). Primers used are those designated 1-25 
10 in Figure 10. 

A third clone of about 600 bp (designated pCH8-600) overiapping on the 5' end (Figure 6) 
was obtained using CLONTECH Marathon^ cDNA Amplificatk>n Kit Briefly, two DNA primers CH8a 
and CHBb (Figure 10) were synthesized. Polyadenylated RNA from breast cancer ceil line 6T474 
was reverse transcribed using CH8b primer. After second strand synthesis, adaptor DNA provided in 

« 

15 the kit was ligated to the double-stranded cDNA The 5' end cDNA of CH8-2a13-1 was then amplified 
by PGR using primers CH8a and API (provided in the kit). To increase the specificity of the PGR 
products, the first PGR products were PGR reamplifled using nested primers CH8a and AP2 
(provided in the kit). The PGR products were cloned into pGRII vector (Invitrogen) and screened with 
CH8-2a13-1 probe. 

20 By sequencing relevant portions of the three ctones, a nucleic ackl sequence of 3982 base 

pairs between the 5' end and the poly-A tail of GH8-2a13-1 was detemiined. The DNA sequence is 
shown in Figure 11 (SEQ. ID NO: 18). Bases 1-152 are t>elieved to be a 5' untranslated region. The 
longest open reading frame is in frame 3 from base 153 to 3911, and codes for 1252 amino acids 
before the stop codon. The corresponding amino ackl sequence of this frame is shown in the upper 

25 panel of Figure 12 (SEQ. ID NO:19). The sequence predicted for the translated protein is shown in 
the lower panel of Figure 12(SEQ. ID NO:20). 

A GENINFO® BLAST search of nucleotide and peptkle sequence databases was performed 
through the National Center for BiotechnokDgy Infonfnatfon on March 26. 1996. The sequences were 
found to be about 99% identical at the nucleotide and amino ackJ level with bases 343-4103 of 

30 KIAA0196 protein (N. Nomura et al.. in press; sequence submitted to the DDBJ/EMBi-ZGenBank 
databases on March 4, 1996). The KIAA0196 was one of 200 different cDNA cbned at random from 
an immature male human myeloblast cell line. K1AA0196 has no known biological function, and is 
described by Nonuira et al. as being ubiquitously expressed. 

A fourth done of about 600 bp overiapping pGH8-600 at the 5' end has also been obtained. 

35 Briefly, a DNA primer was synthesized corresponding to about the first 20 nucleotkies at the 5' of the 
predk:ted cDNA sequence, and used along with a primer based on the pGH8-600 sequence to 
reverse-transcribe RNA from breast cancer cell line BT474. The product was cloned into pCRII vector 
(Invitrogen) and screened with a GH8-2a13-1 probe. The new ckxie is sequenced abng both strands 
to obtain additional 5* untranslated sequence data for the cDNA. The predk:ted compiled cDNA 
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nucleotide sequence of CH8-2a13-1 cDNf is shown in Figure 13 (SEQ. ID NO:21). The 
corresponding amino add sequence of this franne is shown in Figure 14 (SEQ. 10 NO:22). A 
polynucleotide comprising the compiled sequence is assembled by joining the insert of this fourth 
clone to pCHS^k within the shared region. Briefly. CH8-4k Is cut with Xba\ and Non. The fourth 
5 clone is cut with BamHI and Xbal. The ligated polynucleotide Is then inserted into pCRII cut with 
BamHl and No(\. 

A CH8-2a13-1 ctoned insert has been used to probe the level of relative expresston in 
potyadenylated RNA from a panel of tissue sources obtained from CLONTECH. as in Example 4. 
The relative CH8-2a13-12 expression observed at the mRNA level Is shown In Table 7: 
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Relative levels of expresskm observed were as follows: Low levels of expression were ot>served in 
adult peripheral blood leukocytes (PBL). brain, placenta, lung, liver, skeletal muscle, kidney, and 
pancreas. Medium levels of expressk>n were observed In adult heart, spleen, thymus, prostate, testis, 
15 ovary, small intestine, and cok>n. High levels of expressk>n were observed in four fetal tissues tested: 
brain, lung, liver and kidney. The level of expression in breast cancer cell lines Is relatively high 
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(about on the scale), since the Northern analysis performed on these lines was conducted on 
total cellular RNA. It is likely that the CHd-2a13-1 gene is Involved in a biological process that is 
typical to the tissue types showing medium to high levels of expression, which may relate to increased 
tissue growth or metabolism. 

Example 6; Chromosome 13 gene CH1Ma12'1 

One of the cONA obtained corresponded to a gene that mapped to Chromosome 13. Figure 
4 shows the Southem bbt analysis for the corresponding gene in various DNA digests. Lanes 1 and 
2 are control preparations of placental DNA; the rest show DNA obtained from human breast cancer 
cell lines. Panel A shows the pattern obtained using the CH13-2a12-l cDNA probe; panel B shows 
the pattern using D2S6 probe as a loading control. The sizes of the restriction fragments are 
indicated on the right 

Figure 5 shows the Northern blot analysis for RNA overabundance of the CH13-2a12-1 gene. 
Lanes 1-3 show the level of expression in cultured normal epithelial cells. Lanes 4-19 show the level 
of expression in human breast cancer cell lines. Panel A shows the pattern obtained using the 
CH13-2a12-1 probe; panel B shows the pattern obtained with beta-actin cDNA, a loading control. The 
apparent size of the mRNA varied depending upon conditions of electrophoresis. Full-iength mRNA is 
believed to occur at sizes of about 3.2 and 3.5 kb. 

The results of the RfsIA abundance comparison are summarized in Table 8. The scoring 
method is the same as for Example 4. 
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♦ Gene duplication or RNA ovarabundanca; - no duplication or ovaratMindanoa: nd * not done 

* Deorae of gene duplication is raportadialativa to placental DMA prapara^^ 

Degiae of RNA overabundance is reported relative to the higlwst level ot»aeived fbr several cultures 
5 of normal epithelial eels. 

The gene corresponding to CH13-2a12-1 was duplicated in 7 out of 16 (44%) of the cells 
tested. Three of the positive cell lines (SOOPEi BT474. and MDA435) had t)een studied previously by 
comparative genomic hybridization, but had not shown amplified chromatin in the region where CH13* 
1 0 2A1 2-1 has been mapped in these studies. 

RNA overabundance was observed in 13 out of 16 (81%) of the cell lines tested. Thus, 37% 
of the cells had achieved RNA overabundance by a mechanism other than gene duplication. 
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Cells from primary breast tumors have also been analyzed them for duplication of the 

f 

chromosome 13 gene. Ten of the 62 tumors analyzed (12%) were positive, confimiing that 
duplication of this gene is not an artifact of iri vitro culture. 

The sequence of 107 bases from the 5' end of the 1.5 kb cDNA fragment is shown in Figure 
5 22 (SEQ ID N0:5). There was no substantial homology to any known gene in GenBank. One of the 
three possible reading frames was found to be open, with the predicted amino ackJ sequence shown 
in Figure 22 (SEQ ID N0:6). 

The CH13-2a12-1 gene was further characterized by obtaining additbnal sequence 
information. A X-GT10 cDNA library from the breast cancer cell line BT474 (Example 2) was 

10 screened using the initial cDNA insert, and clones with a 3.5 kilobase and a 1.6 kilobase insert were 
identified. The two identified clones were subctoned into piasmid vector pCRll. T7 and Sp6 primers 
for regkms flanking the cONA inserts were used as initial sequencing primers. Sequencing continued 
by walking along the region of interest by standard techniques, using sequendng primers based on 
data already obtained. The two inserts were found to overlap (Figure 6). Primers used during 

1 5 sequencing are shown in Figure 15. 

By sequencing relevant portk}ns of the 3.5 and 1.6 kb clones, a nucleic acid sequence of 
3339 base pairs between the 5' end and the poly-A tail of CH13-2a12-1 was detemnined. The DMA 
sequence is shown in Figure 16 (SEQ. ID NO:23). Bases 1-520 are believed to be a 5' untranslated 
region. The longest open reading frame is in frame 2 from base 521 to 1838, and codes for 611 

20 amino acids before the stop codon. The corresponding amino ackj sequence of this frame is shown 
in the upper panel of Figure 17 (SEQ. ID NO:24). The sequence predicted for the translated protein is 
shown in the tower panel of Figure 17 (SEQ. ID NO:25). Bases 1838 to 3339 of the nucleotide 
sequence are believed to be a 3' untranslated regfon. which is present in the 3.5 kb insert. The 3.5 kb 
insert appears to be a spik:e variant (Figure 6), in whk:h the 3' untranslated region consists of bases 

25 1 838-2797 in the sequence. 

A GEMNFO® BLAST search of nucleotkie and peptide sequence databases was performed 
through the National Center for Biotechnotogy Infomriation on March 26, 1996. Short segments of 
homofogy with other reported human sequences were found at the nucleotkJe level (<500 base pairs), 
but none with any ascribed function in the respective kientifier. At the amino add level, the sequence 

30 was found to share 33% identities and 5^% positives with 228 reskJues of the //n 79 protein of 
CaenorhabdiUs elegans. This protein has been implicated In regulating the ceil cycle of C. elegans 
(ET Kiprecs. W He & EM Hedgecock). The CH13-2a12-1 gene is suspected of a role in controlling 
cell proliferation. "Controlling cell proliferation'' in this context oceans that an abnormally high or k)w 
level of gene expression at the RNA or protein level results in a higher or lower rate of cell 

35 proliferation, or vice versa, compared with cells with an otherwise similar phenotype. There is also a 
low-level homology between CH13-2a12-1 and VACM-1, a vasopressin-activated. calciunrvmobilizing 
receptor from rabbit kkiney medulla (Bumatowska-Hledin et al). VACM-1 has a transmembrane 
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sequence, whereas none has been detected in CH13-2a12-1. Nevertheless, it Is possible that the 
CH13-2a12-1 protein product has a Ca^^ binding or Ca^ nrx)bjlizing function. 

A CH13-2a12-1 cloned insert has been used to probe the level of relative expression in 
polyadenylated RNA from a panel of tissue sources obtained from CLOIMTECH, as in Example 4. 
5 The relative CHI 3-2a12-1 expression observed at the mRNA level is shown in Table 9: 
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Relatively elevated levels of expression were obsen/ed in heart, skeletal muscle and testis. 
The level of expressk>n in breast cancer cell lines is relatively high (about on the scale), since 
1 0 the Northem analysis perfomied on these lines was conducted on total cellular RNA It Is likely that 
the CH13-2a12*1 gene is involved in a bk}logk:at process that is typical to the tissue types showing 
medium to high levels of expression, which may relate to increased tissue growth or metabolism. 

Fragments conBsponding to the CH13-2a12-1 gene have also been used to screen cell lines 
derived from other types of cancer. Southem analysis showed that about 1 out of 4 breast cancer cell 
15 lines tested have gene duplk:atk>n of CH13-2a12-1. Northem analysis showed that about 3 out of 6 
lines tested have overexpression of the corresponding RNA transcript. 
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Example?: Chromosome 14 gene CH14-2b16'1 

One of the cDNA obtained corresponded to a gene that mapped to Chromosome 14. Results 
5 of the analysis are sununarized in Table 1 0. The scoring method is the same as for Example 4. 





Sotircs 




iiiiiili 


Normal 




1.00' 




1.00" 




BT474 


+ 


2.89 




2.57 




MCF7 


+ 


1.35 


+ 


1.88 




SKBR3 




2.58 




2.19 




T47D 


+ 


2.28 


nd 






MDA157 


+ 


1.52 


+ 


2.52 




UACC812 


+ 


2.23 


nd 






MDA361 




0.97 


+ 


1.43 




MOA453 


+ 


1.58 




5.92 




BT20 








1.07 




60DPE 




0.94 




2.00 




MDA231 


+ 


1.66 




2.19 




CAMA-1 




0.92 




0.71 




DU4475 




0.87 


+ 


1.33 




BT46e 




0.46 


nd 






MDA134 




0.77 


+ 


7.17 




incidence 
(%) 


8/16 
(83%) 


10/12 
(83%) 







Gana duplication or ovarabundanoa; - no duplication or ovaiabundanoa; nd * not dona 
* Degrao of gana duplication it repotted ralathM to ptaoentalDNApieparatio^ 
10 ** Oagree of RNA overabundanoa is repoftad felatlva to the highast (aval obsafvad for savarai cultures 

of nonnal aplthaltal calls. 

The gene corresponding to CH14-2a16-1 was duplicated in 8 out of 15 (53%) of the ceils 
tested. The sequence of 114 bases from the 5' end of the cDlsiA fragment is shown in Figure 22 
15 (SEQ ID NO:7). There was no substantial homology to any known gene in GenBank. One of the 
three possible reading frames was found to be open, with the predicted amino acid sequence shown 
in Figure 22 (SEQ ID N0:8). 
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The CH14-2a16-1 gene was further characterized by obtaining additional sequence 
information. A A.-GT10 cDNA library from the breast cancer cell line BT474 (Example 2) was 
screened using the initial cDNA insert, and two clones were identified: one with a 1 .6 kb insert and 
the other with a 2.6 kb insert. The identified clones were subctoned into plasmid vector pCRII. The 
5 1.6 kb insert was sequenced by using T7 and Sp6 primers for regions flanking the cDNA inserts as 
initial sequencing primers. Sequencing continued by walking along the region of interest by standard 
techniques, using sequencing primers based on data already obtained. Primers used are those 

designated 1-11 in Figure 18. 

A third clone (designated pCHI 4-800) overlapping on the 6' end (Figure 6) was obtained 

10 using CLONTECH Marathon^ cDNA Amplification Kit Briefly, DNA primers CH14a. CH14b. CH14c 
and CH14d (Figure 18) were prepared. Polyadenylated RNA from breast cancer cell line MDA453 
was reverse transcribed using 14b primer. After second strand synthesis, adaptor DNA provMed in 
the kit was ligated to the double-stranded cDNA. The 5' end cDNA of CH14-2a16-1 was then 
amplified by PGR using primers CH14b (or CH14c) and AP1 (provided in the kit). To increase the 

15 specificity of the PGR products, the first PGR products were PGR reamplified using nested primers 
CH14a (or CH14d) and AP2 (provkled in the kit). The PGR products were ctoned into pCRII vector 
(Invitrogen) and screened with GH14-2a16-l probe. 

By sequencing pGH14.1.6 and pGH14-800. a nucleic acid sequence of 2021 base pairs 
between the 5* end and the poly-A tail of GH14.2a16-1 has been determined. The DNA sequence is 

20 shown in Figure 19 (SEQ. ID NO:26). The longest open reading frame is In frame 1 from base 1 to 
792. and codes for 263 amino acids before the stop codon. The corresponding amino ackJ sequence 
of this frame is shown in the upper panel of Figure 20 (SEQ. ID NO:27). The partial sequence 
predicted for the translated protein is shown in the tower panel of Figure 20 (SEQ. ID NO:28), The 2.1 
kb ctone has not been sequenced, but is believed to consist about the same region of the 

25 GH14-2a16-1 cDNA as pGH14-1 .6 and pGH14-800 combined. 

A GENINFO® BLAST search of nucleotide and peptide sequence databases was perfomned 
through the National Genter for Biotechnology Infomiation on March 26, 1996. Short segments of 
homology with other reported human sequences were found at ttie nucleotide level (<500 base pairs), 
but none with any ascribed function in the respective identifier. At the amino add level, the sequence 

30 was found to share homologies wittiin the first 106 resWues witti an RNA binding protein from 
Saccharomyces cemvisiae witti the designation NAB2, NAB2 is one of the nrwijor proteins associated 
with nuclear polyadenylated RNA In yeast cells, as detected by UV light-induced cross-linking and 
immunofluorescence. NAB2 is strongly and specifically associated with nuclear poly(A)+ RNA in vivo. 
Gene knock-out experiments have shown ttiat tiiis protein is essential to yeast cell survival 

35 (Anderson et al ). Accordingly, the protein encoded by GH14-2a16-1 is suspected of having DNA or 
RNA binding activity. 

A fourth clone (pGH 14-1.3) has been obtained ttiat overtaps the pGH14-800 clone at the 5* 
end (Figure 6). The method of isolation was similar to that for pCH14-800. using primers based on 
ttie pGH14-800 sequence. Partial sequence data for pCH14-1.3 has been obtained by one- 



wo 97/38085 



PCTAJS97/05930 



directional sequencing from the 5' and 3' ends of the pCH14-1.3 done. Figure 21 shows the 
nucleotide sequence of the sequence of the 5' end (SEQ. ID NO:29) and the amino add translation of 
the likely open reading franie (SEQ. ID NO:30): the nudeotide sequence of the 3* end (SEQ. ID 
N0:31) and the likely open reading frame (SEQ. ID NO:32). This data is confirmed and additional 
5 sequence between SEQ. 10 NOS.29 and 31 is obtained by fully sequendng both strands of pCH14- 
1.3. Once compiled, the sequence data from pCH14-1.3. pCH14*800 and pCH14-1.6 may be shorter 
than the apparent size of mRNA obsen/ed in Northern analysis (Table 1). If necessary, further 
sequence data at the 5' end is deduced by obtaining additbnat ctoned cDNA according to approaches 
described in this Example or Example 4. 

10 Figure 25 is a listing of additional cDNA sequence obtained for CH14-2a16-1, comprising 

approximately 1934 base pairs 5' from the sequence of Figure 19. The corresponding amino add 
translation is shown in the upper panel of Figure 26. The additk>nai sequence data was obtained by 
rescuing and amplifying further fragn^ents of CH14-2a16'1 cDNA. Nested primers were designed 
^100 base pairs downstream from the 5' end of the known sequence. The primers were used In a 

15 nested amplifrcation assay using API and AP2, using the CLONTECH Marathon^ cDNA 
Amplification Kit as described above. The template was a Marathon^ ready cDNA preparation from 
human testes, also supplied by CLONTECH. 

The nudeotide sequence shown in Figure 25 is dosed at the the 5' end. The bwer panel of 
Figure 26 shows what is predicted to be (he sequence of the gene product, beginning at the first 

20 methnnine residue. The nucleotkJe sequence shown contains a point difference at the position 
indicated by the underlining in Figure 25. A base determined to be A from the previously obtained 
polynucleotide fragment was a G in the one used in this part of the experiment This corresponds to a 
change from E (glutamic add) to G (glydne) in the protein sequence, at the position underlined in 
Figure 26. This may represent a natural allelic variation. 

25 A CH14-2a16-1 doned insert has been used to probe the level of relative expresskin in 

potyadenylated RNA from a panel of tissue sources obtained from CLONTECH. as in Example 4. 
The relative CH14-2a16'1 expression obsen/ed at the mRNA level is shown in Table 11: 
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Lj: ■ * : - -y;- •^TV^BLE-llti^Wor^ analysis- • Wji 




^ ■'CH14-2a16-1^m 


heart 


+ 


brain 


+ 


placenta 


1 






H liver 


+ 


1 skeletai musde 


+ 


1 kidr^y 




g pancreas 




1 spleen 


+ 


1 thymus 


-I- 


prostate 


+ 


testis 


++++ 


ovary 


+ 


small intestine 


+ 


colon 




peripheral blood 


+/ 





CH14-2a16-1 mR^4A was particularly high in testis. The level of expression in breast cancer 
ceU lines is also quite high, since the Northern analysis perfbmied on these lines was conducted on 
5 total cellular RNA. It is likely that the CH14-2a16-1 gene is involved in a biological process that Is 
lypk:al to the tissue types showing niedium to high levels of expresskjn. whteh may relate to increased 

tissue growth or metabolism. 

Five motifs corresponding to a zinc finger protein have been found in the CH14-2a16-1 
nucleotide sequence. Further zinc finger motifs may be present in CH14.2a16-1 in the upstream 
10 direction. Zinc linger motifs are present for example, in RNA polymerases I, II. and III from S. 
cersvlsiae, and are related to the zinc knuckle family of RNA/ssDNA-binding proteins found in the HIV 
nucleocapskj protein. The actual sequence observed In each of the five zinc finger motife of 
CH14-2a16-1 is: 

15 £i£S-(Xaa)s-Cys-{Xaa)4-a£S-{Xaa)j-HiS or (SEQ. ID NO:38) 

Cy»-<Xaa)s-Clffi-(Xaa) 5-C3«*-(Xaa)rtiia (SEQ. ID NO:39) 
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which is indicated in Figure 20 by underlining. This is identical to the 7 zinc finger nrotifs of NAB2. 
which make up an RNA/ssDNA binding region (Anderson et al.). Accordingly, the CH14-2a16-1 gene 
6 product is suspected of having DNA or RNA binding activity, and may be specific for polyadenylated 
RNA. It may very weU play a role In the regulation of gene replication, transcription, the processing of 
hnRNA into mature mRNA. the export of mRNA from the nucleus to the cytoplasm, or translation into 
protein. This role in turn nriay be closely implicated in cell growth or proliferation, particularly as 
manifest in tunrxx cells. 

10 

ExamplB 8: Identification of other cancer-assoc/ated genes 

cDNA fragments corresponding to additional cancer-associated genes are obtained by 
applying the techniques of Examples 1 fii 2 with appropriate adaptations. As before, cancer cells 
15 are selected for use in differential display of RNA. based on whether they share a duplicated 
chromosomal region according to Table 12: 





1 Ghrqnfibsomair • 

I' ■ ;r^:;^localidiri-'" 




1 1p22-32 


smaU cell (Levin 1994) | 


1p22 


bladder (Kattioniemi 1^95) 


1p32-33 


rabdomyosarcoma (Steilen-Gimbel); breast (Ried 1995); 
small cell lung (Ried 1994) 


1q21-22 


sarcx3ma(Forus1 995a &b): breast (Muleris 1994a) 


1q24 


small cell (Levin 1994) 1 


1q31 


bladder (Kallioniemi 1 995) 1 


1q32 


glioma (Mulerts 1 994b; Schrock) 1 


1q 


head and neck (Speicher 1 995). breast (Muleris 1 994a) | 


1 2p23 


small cell lung (Ried 1994) 1 


1 2p24-25 


smaO ceil lung (Levin 1 994) | 




head and neck (Speicher 1 995) 1 


2q 


head and neck (Speicher 1995) 


2q33-36 


head and neck (Speicher 1995) 


3p22-24 


bladder (Voorter). small oeli (Levin 1994) | 


3q24-26 


bladder (Kallioniemi 1995). glkxna (Kim), osteosarcoma (Tarkkanen) 1 


3q25-26 


ovarian (Iwabuchi) | 
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TaIsLE 1 i: cinc^r ciaJ1 1^ chrorrkMtbihal: 



•■•"i'S)' V..':" iMi^ttMAVI" •••••• "'ii r. 




3q26-tenn 


head and neck (Speicher 1 995) 1 


3q 


small cell lung (Levin 1995: Rend 1994); head and neck (Speicher 1995) fl 


4q12 1 


glioma (Schrock) | 


5p 


small cell lung (Levin 1994 & 1995; Ried 1994) U 


5p15.1 


glioma (Muleris 1994b) | 


6^ 


osteosarcoma (Foms 1 995a); breast (Ried 1 995) | 


6n21-term 1 

^^^^^^ t I V • ■ 


melanoma (Speicher) 1 


* 1 


glioma (Schliegel 1 994 & 1996; may be EGFR) | 


7p11-12 


glioma (Muleris 1994b; Schrock), small cell lung (Ried 1994) 


7q21-32 


glioma (Kim; Muleris 1994b; Schrod() 


7q21-22 


head and neck (Speicher). glioma (Schrock) 


7ci33-term 


head and neck (Speicher 1995) 




colon (Schiegel 1995); glioma (Kim), head and neck (Speicher); 
prostate (Vtsakorpi) 


8q 1 


small cell lung (Ried 1994) 


1 8q21 


bladder (Kallioniemi 1995) 


1 8q24 


myeloid leukemia (Mohamad) 


8q22-24 


glioma \i\im, Muiens i os^^n/, or ^wiuici lo 1 


1 8q24-25 


f^a\\ n Awin iQQd* RiaH lfiQ4V braast ^Muleris 1994a) 
small ceil ^Levin iwim, rMtm iw^/, wiw««* \mM»w»« i»«#-r«/ 


8q23-tenTk 


earv^nma /PATiic ififiSa) malaMoma (Soeicher) 


8q24 


ovanan (Iwabuchi) 


8q 


hmasl rRied 1995* Isola' Muleris 1994a). small cell lung (Levin 1994 & 1995), B- 
cell leukemias (Bentz 1994a). myetoid leukemia (Bentz 1 994b). glioma (Schiegel). 
head and neck (Speicher 1 995), prostate (Cher, visaKorpi) 




Fhead and neck (Speicher) 


1 9p 


head and neck (Speicher) 


1 


glioma (Muleris 1994b) 


1 9p13 


1 breast (Muleris 1994a) ^ 


1 lOp 


1 head and neck (Speicher 1995) 


1 10p13-14 


bladder (Voocter) 


1 10q22 


1 breast (Muleris 1994a) . 


I 11q13 


1 head and neck (Speicher 1995). breast (Muieps 1994a) 
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■Cfiromosom v 




12 

12p 

12q 
12q12-15 
12q21.3-22 


B-cell leukemias (Bentz 1 995a) 

head and neck (Speicher 1995). glioma (Schrock) 

glioma (Schlegel 1 994) | 
bladder (Voorter). osteosarcoma (T arkkanen). liposarcoma (SuijkertHJijk) | 
llposarcoma (Suijkertouijk) | 


13 
13q 
13q21-34 
1 13q32-term 


coton (Schlegel 1995) 

breast (Ried 1 995), head and neck (Spek:her 1 995) | 
bladder (Kaltoniem1 1 995) 

head and neck (Speicher 1 995), small cell lung (Ried 1 994) | 


14q 


head and neck (Speicher 1995) 


15q26 


breast (Muleris 1 994a) 


16 
16p 
16p11,2 


head and neck (Speicher 1 995) 
breast (Ried 1995) 
breast (Muleris 1994a) 


17 

17p1M2 

17q 
17q21.1 
17q22-23 
17q22-24 


head and neck (Speicher 1995) 
osteosarcoma (Forus 1995a; Tarkkanen) 
breast (Ried 1995). small celt lung (Ried 1994) 
breast (Mulens 1994a) 
bladder (Vooiter), breast (Muleris 1994a) 
breast (Kallkxiiemi 1994) 


18p11 


bladder (Voorter) 


19q13.1 


small cell lung (Ried 1994) 


20p 
20q 
20q13.3 


head and neck (Speicher 1 995) 

ovarian (Iwabuchi), colon (ScNegel 1995). breast (Isola; Tanner) 
breast (Muleris 1994a). Kallksniemi (1994) 


22q 
22q1M3 


head and neck (Speicher 1995) 
bladder (Voorter), glioma (Schrock) 


X 
Xq 
Xq24 
1 Xq11-13 


prostate (Visakorpi) 
small cell lung (Levin 1995) 
small cell (Levin 1994) 

prostate (Visakoipi), osteosarcoma (Tarkkanen) 



Control RNA is prepared from normal tissues to match that of the cancer cells In the 
experiment. Nomial tissue is obtained from autopsy, biopsy, or surgical resection. Absence of 
neoplastic cells in the control tissue is confirmed, if necessary, by standard histotogica) techniques. 
5 cONA corresponding to RNA that is overabundant in cancer cells and duplicated in a proportion of 
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the same cells is characterized further, as In Examples 3-7. Additional cDNA comprising an entire 
protein-product encoding region is rescued or selected according to standard molecular biology 
techniques as described elsewhere in this disclosure. 

5 
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Claims 



What is claimed as the invention is: 



5 1 . An isolated polynucleotide comprising a linear sequence of at least 1 0 nucleotides identical to 
a linear sequence contained in a polynucleotide selected from the group consisting of CH8- 
2a13-1 . CH13-2a12-1, CH14-2a16-1. and CH1-9a11-2. 

2. An isolated polynucleotide comprising a linear sequence of at least 40 consecutive 
10 nucleotides at least 90% identical to a linear sequence contained in a sequence selected 

from the group consisting of SEQ. ID N0:15, SEQ. ID N0:18. SEQ. ID N0:21. SEQ. ID 
NO:23. SEQ. ID NO:26. SEQ. ID NO:29, SEQ. ID NO:31., SEQ. ID NO:33. and SEQ. ID 
NO:35; but not in any of SEQ. ID NOS: 1 . 3, 5. and 7. 

15 3. The isolated polynucleotide of claim 2, comprising a linear sequence of at least 100 

consecutive nucleotides at least 90% Identical to a sequence contained in the selected 
sequence. 

4. The isolated polynucleotide of claim 2, comprising a linear sequence of at least 40 
20 consecutive nucleotides at least 95% identical to a sequence contained in the selected 

sequence. 

5. An isolated polynucleotide comprising a linear sequence of at least 40 consecutive 
nucleotides that hybridizes with a DNA having a sequence selected from the group consisting 

25 of SEQ. ID NO: IS. SEQ. ID NO; 18. SEQ. ID NO:21. SEQ. ID NO:23. SEQ. ID NO:26. SEQ. 

ID NO:29. SEQ. ID N0:31.. SEQ. ID NO:33. and SEQ. ID NO:35; under conditions where it 
does not hybridize with SEQ. ID NOS: 1. 3. 5. 7. or any other DNA from a human cell. 

6. The isolated polynucleotide of claim 5. wherein tiie linear sequence is at least 100 
30 consecutive nucleotides 

7. An isolated polynucleotide comprising a sequence of at least 40 consecutive nucleotides that 
hybridizes wrth an RNA having a sequence selected from the group consisting of SEQ. ID 
NO:15, SEQ. ID N0:18. SEQ. ID N0:21, SEQ. ID NO;23. SEQ. ID NO:26. SEQ. ID NO:29. 

35 SEQ. ID NO:31.. SEQ. ID NO:33. and SEQ. ID NO:35; under conditions where it does not 

hybridize witt) SEQ. ID NOS: 1 . 3. 5. 7. or any other RNA from a human cell. 
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8. The isolated polynucleotide of claim 7. wherein the linear sequence is at least 100 
consecutive nucleotides 

9. The isolated polynucleotide of any of claims 2-8. wherein said linear sequence is contained in 
a duplicated gene or overabundant RNA in cancerous cells. 

10. The isolated polynucleotide of any of claims 2-8, which is a CH13-2a12-1 polynucleotide, and 
is contained in an encoding region for a protein or RNA molecule that controls cell 
proliferation. 

1 1 . The isolated polynucleotide of any of daims 2-8, which is a CH1 4-2a16-1 polynucleotide, and 
is contained in an encoding region for a protein with DNA or RNA binding activity. 

12. The isolated polynucleotide of any of claims 2-6. present in a recombinant plasmid deposited 
15 under ATCC Accession No. 98074 

13. The isolated polynucleotide of any of daims 2-8, present in a recombinant phage deposited 
under ATCC Accession No. 97595, 

20 14. The isolated polynucleotide of any of claims 2-8, present in the XBCBT474 cDNA library 

deposited under ATCC Accession No. 97594. 

15. An isolated polynucleotide comprising a linear sequence of polynucleotides essentially 
identical to a sequence selected from the group consisting of SEQ. ID NO:15. SEQ. ID NO: 

25 1 8. SEQ. ID NO:21 . SEQ. ID NO:23, SEQ. ID NO:26, SEQ. ID NO:29, SEQ. ID NO.SI . SEQ. 

ID NO:33, and SEQ. ID NO:35. 

16. An isolated polypeptide comprising a linear sequence of at least 5 amino add residues 
identical to a sequence encoded by a polynucleotide selected from the group consisting of 

30 CH1-9a1 1-2, CH8-2a13-1, CH13-2a12-1, and CH14-2a16-1. 

17. An isolated polypeptide comprising a linear sequence of at least 5 oonsecutive amino adds 
Identical to a linear sequence contained in a sequence selected from the group consisting of 
SEQ. ID N0:17. SEQ. ID NO:20. SEQ. ID NO:22, SEQ. ID NO:24. SEQ. ID NO:28, SEQ. ID 

35 NO:30. SEQ. ID NO:32, SEQ. ID NO:34. and SEQ. ID NO:37; but not in any of SEQ. ID 

NOS: 2, 4. 6. and 8. 

1 8. The isolated polypeptide of claim 1 7,. comprising a linear sequence of at least 1 5 consecutive 
amino acids at least 90% Identical to a linear sequence contained in the selected sequence. 
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19. The isolated polypeptide of claim 17 or 18, wherein said linear sequence is encoded in a 
duplicated gene or overabundant RNA in cancerous cells. 

20. The isolated polypeptide of claim 17 or 18, which is overexpressed in cancerous cells. 

21. The isolated polypeptide of claim 17 or 18, wherein the polynucleotide selected from said 
group is a CH1-9a1 1-2 polynucleotide, and the polypeptide is a transmembrane polypeptide. 

22. An isolated polypeptide comprising a linear sequence of amino acids essentially identical to a 
sequence selected from the group consisting of SEQ. ID NO: 17, SEQ. ID NO:20, SEQ. ID 
NO:22. SEQ. ID NO:24, SEQ. ID NO:28. SEQ. ID NO:30, SEQ. ID NO:32, SEQ. ID NO:34. 
and SEQ. ID NO:37: but not in any of SEQ. ID NOS: 2. 4, 6, and 6. 

23. An isolated polynucleotide comprising an encoding sequence for the polypeptide of any of 
claims 17 to 22. 

24. A monodonat or isolated polyclonal antibody specific for the polypeptide of claim 22. 

25. A method of detecting gene duplication in cancerous cells, comprising the steps of: 

a) reacting DNA contained in a clinical sample with a reagent comprising the 
polynucleotide of claims 2-8, said clinical sample having been obtained from an 
individual suspected of having cancerous cells; and 

b) comparing the amount of any complexes tomed between the reagent and the Df4A in 
the clinical sample with the amount of any complexes formed k)etween the reagent and 
DNA in a control sample. 

26. A rneVnod of detecting overabundance of RNA in cancerous cells, comprising the steps of: 

a) reacting RNA contained in a clinical sample with a reagent comprising the 
pdynucleottde of claim 2-8. said clinical sample having been obtained from an individual 
suspected of having cancerous cells; and 

b) comparing the anxnint of any complexes fbmned between the reagent and the RNA in 
the clinical sample with the amount of any complexes fonmed between the reagent and 
RNA in a control sample. 
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27. A method of determining gene duplication or overabundance of RNA in cancerous cells, 
comprising the steps of: 

a) amplifying DNA or RNA in a clinical sample with a primer comprising the polynudeotide 
5 of claim 2-8 to yield an amplified polynucleotide, said clinical sample having been 

obtained firom an individual suspected of having cancerous cells; and 

b) comparing the anrtount of polynucleotide amplified from the DNA or RNA with the 
amount of polynucleotide amplified from DNA or RNA from a control sample. 

28. A method of screening for cancer associated with a gene duplication in an individual, 
comprising the steps of. 

a) determining gene duplication in cells from the individual according to the method of daim 
25; and 

b) correlating any gene duplication detenmined in step a) with an increased risk for the 
cancer. 

29. A method of screening for cancer associated with overexpression of RNA in an individual, 
comprising the steps of: 

a) determining overexpression of RNA in cells from the individual according to the method 
of claim 26; and 

b) con-elating any RNA overexpression determined In step a) with an increased risk for the 
cancer. 

30. A method of screening for cancer associated with a gene duplication or overexpression of 
RNA in an individual, comprising the steps of: 

a) determining gene duplication or overexpression of RNA in cells from the indivkJual 
according to the method of daim 27; and 

b) correlating any gene dupricatk>n or overexpression of RNA determined in step a) with an 
increased risk for the cancer. 
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31 . The method of any of claims 28-30. which is a screening method for breast cancer 

32. A diagnostic kit for detecting gene duplication or RNA overabundance in cells contained in an 
individual as manifest in a dinicat sample, comprising a reagent and a buffer in suitable 
packaging, wherein the reagent comprises the polynucleotkie of any of claims 2-6. 

33 A method for detecting altered protein expression in cancerous cells, comprising the steps of: 

a) reacting a polypeptide contained in a clinical sample with a reagent comprising the 
antttxxly of claim 24. said dinteal sample having been obtained from an individual 
suspected of having cancerous cells; and 

b) comparing the amount of any complexes formed between the reagent and the 
polypeptide in the clinical sample with the amount of any complexes formed between the 
reagent and a polypeptkie in a control sample. 

34. A diagnostk: kit for detecting a polypeptide present in a clink:al sample, comprising a reagent 
and a buffer in suitable packaging, wherein the reagent comprises the antibody of daim 24. 

35. A host ceil genetically altered by the polynucleotide of any of daims 2 to 8 or daim 23. 

36. A method of screening a pharmaceutical candklate. comprising the steps of: 

a) separating progeny of the cell of daim 35 into a first group and a second group; 

b) treating the first group of cells with the phannaceutical candidate; 

c) not treating the second group of cells with the pharmaceutical candkiate; and 

d) comparing the phenotype of the treated cells with that of the untreated oells. 

37. A pharmaceutical preparatfon for use in cancer therapy, comprising the polynucleotide of 
claim 2 to 8 or claim 23. saki preparatk}n k>eing capable of redudng the pathofogy of 
cancerous cells. 

38. A method for treating an individual bearing cancerous cells, comprising administering the 
pharmaceutical preparation of daim^37. 

39. A pharmaceutical preparation for use in cancer therapy, comprising the antibody of claim 24. 
sakI preparation being capable of nedudng the pattiology of cancerous celts. 

40. A method for treating an indivkiual t)earing cancerous cells, comprising administering the 
pharmaceutical preparation of claim 39. 
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41. A pharmaceuticaf preparation oomprising the polypeptide of claim 17 of 18 in an 
immunogenic form, and a pharmaceutically compatible excipient 

5 42. A method for treatment of cancer, comprising administration of the pharmaceutical 
preparation of claim 41 . 

43. A method for obtaining cDNA corresponding to a gene that is duplicated or overexpressed 
in cancer, comprising the steps of: 

10 a) supplying an RNA preparation from control cells; 

b) supplying RNA preparations firom at least two different cancer cells; 

c) displaying cDNA corresponding to the RNA preparations of step a) and step b) such that 
different cDNA corresponding to different RNA in each preparation are displayed 
separately; 

15 d) selecting cDNA coaesponding to RNA that is present in greater abundance in the 

cancer celts of step b) relative to the control cells of step a); 

e) supplying a digested DNA preparation from control cells; 

f) supplying digested DNA preparations from at least two different cancer cells; 

g) hybridizing the cDNA of step d) with the digested DNA preparations of step e) and stop 

20 f); and 

h) further selecting cDNA from the cDNA of step d) corresponding to a gene that is 
dupficated in the cancer cells of step 0 relative to the control ceils of step e). 

44. The method of claim 43. wherein the two different cancer ceils used to supply RNA in step 
25 b) share a duplicated gene in the same region of a chronrx>some. 

45. The method of claim 43, wherein RNA preparations from at feast three different cancer 
cells are supplied in step b). ^ 

30 46, The method of claim 43. wherein the three different cancer cells used to supply RNA in 

step b) share a duplicated gene in the same region of a chromosome. 

47. The method of claim 43. wherein the control cells of step a) are uncultured. 

35 48, The method of daim 43, further pomprislng supplying a digested mitochondrial DNA 

preparation; hybridizing the cDNA of step h) with the digested mitochondrial DNA 
preparation: and further selecting cDNA firom the cDNA of step h) corresponding to genes 
that do not hybridize with the digested mitochondrial DNA preparation. 
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49. The method of claim 43. further comprising the steps of: 
i) supplying an RNA preparation from control cells; 

j) supplying RNA preparations from at least two different cancer cells; 
5 k) hybridizing the cDNA of step h) with the RNA preparations of step 1) and step j); and 

I) further selecting cDNA from the cONA of step h) con-esponding to RNA that is present in 
greater abundance in the cancer cells of step j) relative to the control celts of step i). 

50. The method of claim 49, wherein the gene to which the cDNA corresponds is not 
10 duplicated in at least one of the cancer cells used to supply the RNA in step j) relative to 

the control cells of step e). 

51. The method of claim 43, wherein the two different cancer cells used to supply the RNA 
preparations in step b) are breast cancer cells. 

15 

52. The method of claim 43. wherein the two different cancer cells used to supply the RNA 
preparations in step b) are from a comnrx)n type of cancer, wherein the type of cancer is 
selected from the group consisting of lung cancer, glioblastoma, pancreatic cancer, colon 
cancer, prostate cancer, hepatoma, and myeloma. 

20 

53. The method of claim 43, wherein the two different cancer cells used to supply the digested 
DNA preparations in step f) are breast cancer cells. 

54. The method of claim 43. wherein the two different cancer cells the digested DNA 
25 preparations in step f) are from a common type of cancer, wherein the type of cancer is 

selected from the group consisting of lung cancer, glioblastoma, pancreatic cancer, colon 
cancer, prostate cancer, hepatoma, and myelomta. 

55. A method for obtaining cDNA corresponding to a gene that Is deleted or underexpressed in 
30 cancer, comprising the steps of: 

a) supplying an RNA preparation from control cells; 

b) supplying RNA preparations from at least two different cancer ceils that share a deleted 
gene in the same region of a chromosome; 

c) displaying cDNA corresponding to the RNA preparations of step a) and step b) such that 
35 different cDNA corresponding to different RNA in each preparation are displayed 

separately; and 

d) selecting cDNA corresponding to RNA that is present in lower abundance in the cancer 
ceils of step b) relative to tiie control cells of step a). 
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56. The method of claim 55. further comprising the steps of: 

e) supplying a digested DNA preparation from control cells; 

0 supplying digested DNA preparations from at least two different cancer cells; 

g) hybridizing the cONA of step d) with the digested DNA preparations of step e) and step 
f); and 

h) further selecting cDNA from the cDlsiA of step d) corresponding to a gene that is deleted 
in the cancer ceils of step f) relative to the control cells of step e). 

57. A method for characterizing a gene that is duplicated or has altered expression in cancer, 
comprising obtaining cDNA corresponding to the gene according to the method of any of 
claims 43-56, and then sequencing the cONA. 

58. A method of screening a candidate drug for cancer treatment, comprising obtaining cDNA 
corresponding to a gene that is duplicated or has altered expression in cancer according to 
the method of any of claims 43-56, and comparing the effect of the candidate drug on a 
cell genetically altered with the cDNA with the effect on a cell not genetically altered with 
the cDNA. 
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Figure 8(A) 



1 GAATACATAT ATAAATGGTG TTCAGTTAGA GTTGCTCTTT ATCGGCAGCG 

51 CAGCCGAACT GCTTTGAGTA AAGGAAAAGA TTATCTTOTC TTAGCTCAAC 

101 CACCCTTACT ACTTCCTGCG GAATCAGTAG ATCTTTCAGT ATTGCAACCT 

151 CTGAGTOGAG AATTGGAAAA TACGAATATA GAAAGGGAAG CTGAAACTGT 

201 TGTTCTGGGT GATTTAAGTA GTAGTATGCA CCAGGATCAC TTGGTGAATC 

251 ACACTGTAGA TGCAGTTGAA CTTGAACCAA GCCATTCTCA AACTCTTICT 

301 CAGTCTCTTC TTTTAGATAT TACCCCAGAA ATCAATCCCT TCCCTAAAAT 

351 AGAAGTATCT GAGTCTGTTG AATATGAGGC AGGACATATA CCATCACCAG 

401 TGATTCCCCA AGAGAGTTCT GTTGAGATCG ATAATGAAAC AGAACAAAAG 

451 TCTGAGAGCT TTAGTTCTAT AGAGAAACCA TCTATTACCT ATGAAACAAA 

501 TAAAG TTAAT GAGTTAATGG ATAATATTAT AAAAGAAGAT ATCAACTCCA 

551 TGCAAATTTT CACAAAGCTG TCTGAAACAA TAGTGCCACC AATAAATACA 

601 GCCACTGTAC CCGACAATGA AGATCGGGAA GCCAAAATGA ATATAGCTGA 

651 CACAGCAAAG GAAACTTTCA TTTCTGTTCT GGATTCTTCT TCATTACCTC 

701 AAGTAAAAGA AGAAGAACAG TCTCCAGAAG ATGCCCTTTT GAGAGGGTTA 

751 CAGAGGACAG CTACAGATTT TTATGCTCAA TTGCAAAATT CTACAGATCT 

801 AGGATATGCT AATGGAAATC TTGTACATCG ATCAAACCAA AAGGAGTCAG 

851 TATTTATGAG ACTTAATAAT CGTATTAAAG CCTTAGAAGT TAACATGTCT 

901 CTCAGTGGTC GCTATCTGGA GGAGCTTAGC CAAAGGTACC GAAAACAAAT 

951 GGAAGAAATG CAAAAGGCTT TCAACAAAAC AATCGTGAAA CTTCAGAATA 

1001 CTTCAAGAAT AGCAGAGGAG CAGGATCAGC GGCAAACTGA AGCCATCCAG 

1051 TTGCTACAGG CACAGCTGAC CAACATGACA CAGCTTGTTT CAAATTTATC 

1101 AGCAACAGTA GCAGAATTGA AACGGGAGGT TTCAGATCGA CAAAGCTATC 

1151 TTCTCATATC TrroGTTCTT TGTGTTGTCT TGGGACTGAT GCTTTCTATC 

1201 CAGCX3TTGTC GAAATACTTC TCAATTTGAT GGAGATTATA TTTCAAAACT 

1251 TCCTAAAAGT AATCAGTATC CAAGCCCTAA AAGGTGTTTC TCTTCCTATC 

1301 ATGATATCAA TTTGAAAAGA AGAACITCAT TCCCACTCAT GAGATCCAAG 

1351 TCTCTACAGT TAACTGGCAA AGAAGTAGAC CX:AAATGATT TGTACATTGT 

1401 AGAACCCCTC AAGTnTCTC CAGAAAAGAA GAAGAAGCGC TGCAAGTACA 

1451 AAATTGAAAA AATTGAGACC ATAAAGCCTG AAGAACCATT GCACCCCATA 

1501 GCCAATQGCG ACATAAAAGG AAGAAAGCCC TTTACGAACC AGAGAGATTT 

1551 TTCTAATATG GGAGAAGTTT ATCACTCTTC TTATAAAGGT CCTCCATCTC 

1601 AAGG AAGCTC AGAAACTTCA TCACAGTCAG AAGAGTCCTA TTTITGTGGC 

1651 ATTTCAGCTT GCACAAGTCt GTGCAATGGA CAGTCTCAAA AGACAAAAAC 

1701 TGAGAAGAGG GCTTTAAAAC GAAGACGATC TAAAGTCCAA GACCAAGGAA 

1751 AATTGATAAA AACTCTAATA CAGACTAAGT CGGGATCATT GCCX3AGCCTG 

1801 CATGACATAA TCAAAGGAAA CAAAGAGATC ACCGTGGGAA CATTTGGTGT 

1851 TACAGCAGTC TCGGGACATA TCTAAAATTA ATTGAACTTT TCATACAGAA 

1901 GACTTTTTTG TTGTTOTTCT TTGAAGAACA GTCTGTAGTA TTTCAAGGGT 

1951 TTGGGGGAGG GAGAAAATAT TAATGGGAAA GGCATTCAGA AATTATGGTT 

2001 TCTACCTTTT TAAAAAGTAG ATOGGATTCT GCTCAATCTT GGTTAATCAG 

2051 CTACAGrrXT ACAAAG CTGA TCACTTCCTA TAAGGACAAT GGTAGACATT 

2101 TTATAAAGAT GTTTnTCAC AAGATTAATT ACTGGGACAA AAGTAAirTC 

2151 GAAGCXrCAGT TCCTTAGGTG GGATAGGAAT GAAAGCCTAA ACCTCTTCCT 

2201 TTAGCTTTGT ICC m TlLT TGCACCTTCC CATATTTATC TGCCTTTTOT 

2251 CTATTTATAA TGCCACTGGA AGAGGAGGGA TAACTTTTTC TGTTATTTGA 

2301 TTTCTTTTAT AACTTOGTTA GGTTTrTGAA GCTGCAAACA CTACAATCCT 

2351 TIGAGQOGGT CTCTGCCTGA AGCTCAGGAG TGTGGATCAG ACAGTCTAAA 

2401 GATCCTAAAA ACTTGC CAAC TGGATCTITG TTTAGCAAAC TCACTGGAAA 

2451 TGAACACTTA ATCGAATTIT TAAGTCTGTT CTGTrAGGTA GATGGTGATG 

2501 CTCTTGTTAT TTTCACTTAT TCAGGCTGGA TrACTTCTTA CTTAGTTACT 

2551 AACTCAATGA GGAAAAAATC CXTTACAGGAT CTTmTTGC AAACAACTGA 



1 



wo 97/38085 

Figure 8(B) 



PCT/US97/05930 



2601 TATATGCAGA CAAATTTTTG ACAAATTCAC CTTTTAAACA CXACXnTAAC 

2651 CGATTTGTGA AGGTTTICTr TAGCTTACAT TTTAAACATA CACAATAAAC 

2701 ACTAATCCTC CyU^CTTTCA CTCTTTTTAT TAGTATGAAT ATAAAATTTG 

2751 AAGGTITCGC CAATTAGTAC AAGTCTCATG ATATAATCAC AGCCTGCATA 

2801 CATATGCACA GATCCAGTTA GTGAGTTTGT CAAGCTTAAT CTAATTCGTT 

2851 AAGTCTAAAG AGATTATTAT TCCTTGATGT rroCTTTGTA TTGGCTACAA 

2901 ATGTGCAGAG GTAATACATA TGTGATGTCG ATC?rCTCTGT CXTXTlTm ' 

2951 GTCTTTAAAA AATAATTGGC AGCAACTGTA TTTGAATAAA ATGATTTCTT 

3001 AGTATGATTG TACAGTAATG AATGAAAGTG GAACATGTTT CTTTTTGAAA 

3051 GGGAGAGAAT TGACCATTTA TTGTTGTGAT GTTTAAGTTA TAACTTATro 

3101 AGCACTTTTA GTAGTGATAA CTGTTTTTAA ACTTGCCTAA TACCnTCIT 

3151 GGGTATTGTr TGTAATGTGA CTTATTTAAC GCCTTCnTG TTTCTTTAAG 

3201 Tl tJ ClU;m ' AGGTTAACAG CGTGTTTTAG AAGATTTAAA TITCTTTCCT 

3251 GTCTGCACAA TTAGCTATIC AGAGCAAGAG GGCCTGATTT TATAGAAGCC 

3301 CTTTGAAAAG AGGTCCAGAT GAGAGCAGAG ATACAGTGAG AAATTATGT6 

3351 ATCTSTGTGT TGTGGGAAGA GAATTTTCAA TATGTAACTA CX3GAGCTGTA 

3401 GTGCCAITAG AAACTGTGAA TTTCCAAATA AATCTGAACA CTTGTCTTTA 

3451 TT 
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1 EVIYXUCSVR VALYRQRSRT ALSKGKESYLV LAQPPLLLPA ESVDVSVLQP 

51 LSGELOmJI EHEAEIWLG DLSSSMHQDD LVMHIVEtflVE LEPSHSC^TLS 

101 QSLLLDITPE INPLPKIEVS ESVEYEAGKI PSFVIPQESS VEIDNETEQK 

151 SESFSSIEKP SITYETNKVN EUCNZIKED MNSMQIFIKL SETIVPPZNT 

201 AIVPCMEDGE AKMNIADTAK QTLZSWDSS SLPEV/KEEBQ SPECALLRGL 

251 QRTATDFY^ L(9i5TDUGVA NGNLVKGSNQ KESVFTIRLNN RIKALEVNMS 

301 LSGRYLEELS QRYRKQMEZK QKAFNKTZVK LQNTSRIAEE QDQROTEAIQ 

351 LLQAQLIlQfr QLVSNLSATV AEUCREVSDR QSYLVISLVL OA/LGLKLCM 

401 QRCR OTSQF D GDYISKLPKS NQYPSPKRCF SSYUOStOXR RTSFPLMRSK 

451 SLQLTCTEVD PNDLYIVEPL KFSPEKKKKR CKYKIEKIET IKPEaSPLHPI 

501 ANGDZKGRKP FTO QRDFSN M GEVYHSSYKG PPSEGSSETS SQSmSYFCG 

551 ISPiCtSLCNG Q SQKTO mOl iOiXRRRSKVQ DQGKLZKILI QTKSGSLPSL 

601 HDIIKGNKEI TVGOTGVTAV SGHI»N«LNF SYRRLfmrS IjKNSL-YLKG 

651 LGBGOraraK GI<»a«FliPF •KVDGIVLNL G«*ATVLQS* SLPIRTMVDI 

701 L«RCFFTRLI TGTKVIWKPS SLQGIGMKA- TSSFSFVPIS CTFPVLCAFC 

751 LFIMPLEEEX3 •LFIXFDFFY NFVRFLKLQT LQCFCGVCA* SSGVWIRQSK 

801 DPKNLPTGSL FSKLTGNEHL MEFLSLFC-V DGEALVIFTY SGWITSYLVT 

851 NSMRKKSLQD LFLQTTOICR QIFDKFTF^T RR-PICBGFL -LTT'TYTIN 

901 TNPPNFHCFY •YEYKI^RPG QLVQVS-YNH SUTIYAQIQL VSLSSLI»LV 

951 KSKEIIIP'C LLCIGYKCAE VIHM-CRCLC LFFCL^KIIG SNC1*IK»FL 

1001 SMTVQ--MKV EHVSF»KGEN •PFIWMFKL •LIQ1F«*»* LFLWLPNTFL 

1051 GYCL-CDLFN AFFVCLSCCF RLTACFRRFK FLSCLHN-LF RARGPDFIEA 

1101 P*KEV»!RAE IQ«EIM«SVC CGKRIFWM^L RSCSAIRNCE FPNKSEHLSL 



1 EYIYKWCSVR VALYRQRSRT ALSKGKDYLV LAQPPLLLPA ESVDVSVLQP 

51 LSGELENTOI EREAETWLG DLSSSMHQDD LVNHTVDAVE LEPSHSffTLS 

101 QSLLLDITPE INPLPKIEVS ESVEYEAGHI PSFVIPQESS VEIDNETEQK 

151 SESFSSIEKP SITyETTJKVN EIUDNIIKED MNSMQIFOKL SETIVPPINT 

201 ATVPDNEDGE AKMNIADTAK QTLISWDSS SLPEVKEEEQ SPEDALLRGL 

251 QRTATDFYAE LQNSTOLGYA NOILVHGSNQ KESVFMRLNN RIKALEVNMS 

301 LSGRYLEELS QRYRKQMEEM QKAFTJKTIVK LQNTSRIAEE QDQRffTEAIQ 

351 LLQAQLONMT QLVSNLSATV AEIiCREVSDR Q SYLVISLVL C\A/LGLMLCM 

401 QRCRNTSQFD GDYISKLPKS l^YPSPKRCF SSYDDMNLKR RTSFPI21RSk 

451 SLQLTSKEVD PNDLYlVEPL KFSPEKKKKR CKYKIEKIET IKPEEPLHPI 

501 ANGDIKGRKP FTOQRDFSNM GEVYHSSYKG PPSEGSSETS SQSEESYTCG 

551 ISACTSLCNG QSQKTKTEKR ALKRRRSKVQ DQGKLIKTLI QTKSGSLPSL 

601 HDIIKGNKEI TVGTPGVTAV SGHI 
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GTGCGCCGTG 
CTCTCTGCAC 
GCGAGTTAAT 
CAATGTTGGA 
AGGATTGTTT 
TGAGTTTATT 
AATATGGAGA 
TCGGAAAGCA 
ATTTCGTGAA 
AAAGTSTACA 
AATGAAGOGG 
AGATGGAAAA 
TACTQGTCAT 
GTTTCTTACT 
GGACGATATT 
GTCCCAAAAG 
ATCAACGAAT 
TATTTACAAC 
CCCTGGCAAA 
TCCATCCTTC 
CTTTCCAGAT 
TAGTAGATGC 
ACCXrrGGACC 
CAGTGAAAGA 
TAAGGGAGGA 
AGAGACTGCA 
AGCCTGTGAC 
TAACAGACTC 
ACTGCACAAT 
AGAAAAGCAA 
TGACTGAGCT 
GAGAAAAATG 
ATTGTCTTTA 
AACTGATACA 
AATCTGCAAG 
AATGATCAGA 
TCGTTOGGGA 
ATCATGCAAG 
AGCTACCTTC 
TTAATCAGGC 
GGAGAGTTGG 
GATGTITACA 
TTGAAGTGCC 
CTAGGCCCAC 
TACTGAAGGC 
TGGATCCAAA 
CXKX5TTGCCT 
GCCAAGTGAA 
GATTCCATCG 
CTGAAGATTT 
GCAAGAGTGT 
TOTACCAGTC 



GCGCGGCCXTG 
ACCTGGTTTC 
CATCCCCAGT 
CTTTCTAGCC 
CCTGTGGTAA 
CCTGCTGTGT 
TATCATATTT 
AACTGGATGC 
AACAACATAG 
TAAATATATT 
TTTATATTCA 
CAACTTCTAT 
TGACCAAAAG 
ACCGATACAG 
TGTAAGCTCC 
ACCATCCAAC 
CCTTCATCAG 
CAGGTCTCAG 
CCAAGCTGCC 
ACACCCATCA 
AATTGGGTAA 
TTGGGAACXrX 
•nrCAAATGT 
GTGCATGCTC 
GATGGTTCTG 
ATGTTGCCAT 
CCAAACAACA 
TCGGTACAAT 
TTGAGTrTAT 
ACCAAATQGG 
TGCTGATGTC 
AAAACCTTCA 
AAITATGATG 
AGCTTTGGAA 
TATGTCAGTT 
ACCATTAACA 
CCTTTCTTTC 
AAAGCATAAG 
CrrAAAGCTTO 
AAATCGCCCC 
TATCCTATGT 
TCTCTTCTAA 
TACCCGCCTG 
GATACGAGGT 
ATXTTTAATGA 
GGAGTTGCTG 
TTGCCCTGCA 
TTGATGCCCA 

TTcrrrrGAA 

GGCAGGAAGA 
AATAACTTTC 
CACTCATATT 



GCTGACAGGT 
ATCTAATAAT 
GTCCAGGCAC 
GAGAACAACX: 
TGCCATCATT 
TCAGGTTAAA 
GATTTCAGCT 
TAAGCCAGAG 
AAATTC3TGAC 
GTAGACTTAA 
GCAAACCTTA 
GTGAAGCACT 
ATTGAAGGAG 
TGCTGCTCGA 
TTCGAAGTAC 
TA1CCXX3AGA 
TATQGTCATT 
CGTATCCTTT 
ATGCTGTACG 
AGCAAAAATG 
TTAGTATTTA 
TACAAAGCTG 
CAGAGAACAG 
AAGTGCAGCA 
GACAATATCC 
CCGATGGCTG 
AACGCCTTCG 
CCCAGGATCC 
ACTCAAAGAG 
AGCATTACAA 
TTTTCAGGAG 
AGCTTGGTTC 
ATTCTACTGC 
GAGGnCAAG 

TTAAAGAGGA 
GCTTGGCAGT 
GGTAAATCCA 
CCTCTGCCCT 
GACXTTGCTCA 
GAGAAAAGTT 
AGATCATAAA 
GACAAAGACA 
TCCCAAGCTT 
TGAAAACGAC 
GAAGATGGAA 
TAGGGGACTG 
AGCTSAAAGA 
TACATACAGG 
AGTATCTCGT 
TAAGAACGAA 
CCAATACCCA 



TCTTTAATGG 
ATACAGACAC 
AGAGTAGTCG 
TCTCTGGCCA 
GCTGAACTTT 
A GACAG AGCT 
ATTTTAAGGG 
CTACAGGATT 
CAGATTTTAT 
ACAGATATCT 
GAAACTGTGC 
GTACTTATAT 
AAGICAGAGA 
TCTIVII jC IXj 
AGGTTATTCT 
GCTATTTCCA 
GGTCGACTCA 
GCCGGAGCAT 
TGATTCTCTA 
AGAGAGATAG 
CATGGGGATC 
CAAAAACTGC 
GCAAGCAGAT 
ATTTCTAAAA 
CAAAGCTTCT 
ATGCTTCATA 
TCAAATCAAG 
TCTICCAGCT 
ATGTTCAAGC 
GAAAGAGGGT 
TGAAACCCCT 
AGAGAGATCT 
TGCGGGCAGA 
AATTCCACXA 
ACTCGAA AGT 
GGITCTGATC 
TGATK3ACAG 
TCXATQGTTA 
CGATCTGCXX: 
GCGTGTCACA 
TTGCAGATCA 
GCTTCAGACC 
AGCTGAGGGA 
ACTCATGCTA 
TTTOGTTGGC 
TAAGGAAAGA 
ATATTCAACC 
GTTGGGAGCG 
ACTATGTCAA 
ATCATAAATT 
GATTCAAGAT 
AGTTTACCCC 



AGGAGCCAAT 
CAGCTCTGAG 
GTCCX5CCTCA 
AGCAATCCTA 
TGAGACTCTC 
GATCAACAGA 
TCCAGAATTA 
TAGATGAAGA 
TTAGCATTTC 
AGATGATCTC 
TTCTCAATGA 
GGAGTTATGC 
GAGGATGCTG 
ATTCAAATAT 
AGCXJVACCAG 
GAGAGTGCXrr 
GATCTGATGA 
CGCAGCACAG 
CTTTGAGCCT 
IGGATAAATA 
ACAGTTAATC 
TTTAAATAAT 
ATGCTACTGT 
GAAGGTTATT 
GAACTGCCTG 
CAGCAGACTC 
GACCAGATTC 
GCTG TTAGAT 
AAATGCTTTC 
TCGGAGCGGA 
AACCAGAGTG 
CAAAACAAAT 
AAAACTGTAC 
GTTGGAATCC 
TTCTTCATCA 
ACAA1GCAGA 
TTTCACATCC 
CTAAACTCAG 
CTTCTTCGTA 
GTACTATTCT 
TCCCAGAAAG 
CACGACATTA 
CTATGCTCAG 

TTTCcATrrr 

ATCATCAAGG 
GCTTGTGAAG 



ACCATGGATG 
CATTTATGGT 
ACAACGTGGA 



TCTCGATGAG 



r 
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CTCCAGATTT 


TCAGTAAAAT 



ACTCTGCAGA GAAATCCOGC GGATCACAGA 
TAGACCAGCT GAACACTTGG TATGATATGA 
AGCAGCCGCC TCITCTCAGA AATCCAGACC 
AAATGGCTTA GACAGGCTTC TGTGCnTAT 
ATTTCCTCAG TATGTITCAG AAAATTATCC 
GACACTTTAA AAACCCTCAT GAATGCTGTC 
CGCAAATTCA AATAAAATTT ATTTTTCCGC 
TTTGGACTGC GTATCTCGAG GCTATAATGA 
CTGAGGCAAC AGATTGCCAA TCAATTAAAT 
TAAACATCTG GCAGCTCCTC TGGAGAATCT 
ACATTGAAGC CCACTATCAG GACXXTTCAC 
AACACACTTT TATATGAAAT CACAGCXTTAT 
CAACCCACTG AATAAGATAT ACATAACAAC 
CAATTGTAAA CTTTCTATTT TTGATCGCTC 
AACAAAAATC TGGGAATGGT CTGCCGAAAA 
GCCACCACTT GTCCTGGGAC TGCTCACTCT 
GGTACACCGA GCAGCTCCTG GCGCTGATTG 
GTOGAGCAGT GTACAAGCCA GAAGATACCT 
GGGTGCCXnT CTGTTCCTGG AGGATTATCT 
GGAGGGTTGC TGAAGCACAT GTGCCTAATT 
ACAGTGCTGT AAC rG T ITl T CCTACTTCTT 
GATCTTCCCA CXATCACAAA TGAATTTGAA 
CTCATACAAC TCCATTTTTT CTGTCTATTA 
GAGTAAGATA TATCTCATGG CATTAGTTAA 
TCATGGTATT ACATGCAATT TATATCAGAT 
TACTGCCTCT CTTAAAOXSCT GAATGTAACT 
GTTTTATGTT CTAAAGAACT ATITCTGCAA 
AGTATTACTA GT 
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Figure 12f A) 



APWRGPADRF raOGANLSAH LVSSNNIQTP AIiRPVNHPQC PGTE^SVRLT 
KLDFLAQJNL CX^LRIVS CGNAIIAELL R LSEF IPAVF RLKI^ADQQK 
YGDIIFDFSY FKGPBUWESK UMCPELQIU. DffiFRQiNIE IVTRFYLAFQ 
SVHKYIVDLN RYLDDIHEGV YIQQTLETVL LNmGKQLLC EALYLYGVML 
LVIDQKIEGE VRERMLVSYY RYSAARSSAD SNMDDICaCLL RSTGYSSQPG 
AKRPSNYPES YFQKJPmES FISMVTGRLR SDDIYNQVSA YPLPEHRSTA 
LANQAAMLYV ILYFEPSIM TOQAXMREIV DKYFPDNWVI SIYMSITVNL 
VDAWEPYKAA RTAUaOTLDL SNVREQASRY ATVSERVHAQ VQQFUCBGYL 
REDIVLCNIP KLIUCXRDCN VAIRWI14LHT ADSACDPNNK RLRQIKDQIL 
TDSRYNPRIL FQUUTTAQF EFXUCEMFECQ MLiSEKQTKWE HVKKBGSERM 
TELADVFSGV KPLTKVEKME NLQAWFREIS KQILSLNYDD STAAGRKTVQ 
LIQALEEVQE FHQLESNLQV CQBliM3TOKF IUCJMIRTINI KEEVLITMOI 
VGDLSFAWQL IDSFTSIMQE SIRVNPSKVT KLRATFUCLA SALDLPLLRI 
NQANRPDLLS VSQYYSGELV SYVRKVLQII P ESMFT SLLK IIKI/JTHDII 
EVPTRLDKDK LRDYAQLGPR YEVAKLTHAI SIFTEGILMM KTTLVGIIKV 
DPKQLLEDGI RKELVKKVW ALHRGLIFNP RAKPSEUIPK LKELXiAIMDG 
FTOSFEYIQD YVNIYGLKIW QEEVSRIINY NVBQECNNFL RTKIQDW8SM 
YQSTOIPIPK FTPVDESVTF IGiaXREILR ITDPKMTCHI DQLNTWYCMK 
THQEVTSSRL FSEIQTTLGT PGLNGUORU* CFMIVKELQN FLSMFQKIIL 
RDRTVQDTLK TLMMAVSPLK SIVRNSNKIY FSAIMCTQKI WTAYLEAIMK 
VG(»1QILRQQ lANELWYSCR FDSKHIAAAL ENIUKALIAD lEAOTQDPSL 
PYPKECOTLL YEITAYLEAA GIHNPLNKIY ITTKRI-PYFP IVNFUlilAQ 
LPKLQYNKNL GMVCRKPTDP VDWPPLVLGL LTLLKQFHSR YTEQLLALIG 
QFICSTVEQC TSQKIPEIPA DWGAIiLFLE DYVRYTKLPR RVAEAHVPNF 
IFDEFRTVL* LFFLLLQWKD CP^IFPPSW NLKMKRNSVA HTTAFFXSIM 
GNIRRYE^DI SHGIS«YN-Y CLNHGITCNL YQIKAEHIFV LPLLNABCNC 
YV^IHLVLCS KELFVQLQIF SKIVLL 




Figure 12^6) 



MLDFLAENNL CGQAILRIVS CGNAIIAELL RLSEFIPAVF KLKDRADQQK 
YGDIIFDFSV FKGPELWESK LDAKPELQDL DEEFRQJNIE IVTRFYLAFQ 
SVHKYIVDLN RYLDDIl^BGV YIQQTLETVL IHEDGKQLLC EALYLYGVML 
LVIDQXIEGE VREHMLVSYY RYSAARSSAD SNMDDICfCLL RSTGYSSQPG 
AKRPSI<nrP£S YFQRVPINES FISMVIGRLR SDDIVNQVSA YPLPEHRSTA 
LANQAAMLW ILYFEPSIIii TOQAKMREIV DKYFPDNWVI SIVMSnVNL 
VDAWEPYKAA KTALNNTLDL SNVFEQASRY ATVSE3WHAQ VQQFLKBGYL 
REEMVLENIP KLLNCLRDCN VAIRWLMLHT ADSACDPNNK RLRQIKDQIL 
TDSRVNPRIL PQLLLDTAQF EFILKEMFKQ MLSEKQTKWE HYKKEX3SERM 
TELADVFSGV KPLTRVEKNE NLQAWFREIS KQILSIIWDD STAAGRKIVQ 
LIQALEEVQE FHQLESNLQV OQFIADTRKF LHgMIRTINI KEEVLIOMQI 
VGDLSFAWQL IDSFTSIlfflE SIRVNPSMVT KLRAOTUOA SALDLPLLRI 
NQANRPDLLS VSQYYSGELV SYVRKVLQII PEaCTSLLK IIKLOTKDII 
EVPraLDKDK LRDYAQLGPR VEVAICLTHAI SlFTEGIWti KITLVGIIKV 
DPKQLLEDGI RKELVKRVAF ALHRGLIFNP RAKPSELMPK LKELGATMDG 
FHRSFEVIQD YVNIYGLKIW QEEVSRIINY NVEQBCNNFL RTKIQDWQSM 
YQSTHIPIPK FTPVDESVTF IGRLCREILR ITDPKMTCHI DQLNTWYDMK 
THQEVTSSRL FSEIQTTLGT PGLNGLDRLL CFMIVKELQN FLSMFXJKIIL 
RDRTVQDTLK TLMNAVSPLK SIVANSNKIY FSAIAKTQKI WTAYLEAIMK 
VGQMQILRQQ lANELNYSCR FDSKHLAAAL ENU^KAUAD IEAHYQDP5L 
PYPKEDOTLL YEITAYLEAA GIHNPLNKIY ITTKRLPYFP IVNFLFLIAQ 
LPKLQYNKNL GMVCRKPTDP VEWPPLVLGL LTLLKQFHSR YTEQLLALIG 
QFICSTVBQC TSQKIPEIPA DVA^GALLFLE DYVRYTKLPR RVAEAHVPNF 
IFDEFRTVL 
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Figure 13(A^ 



AGG GGC GGA AGT CGG GGT CTG ACC CGC TCC AGG TCC GGG ACT GCG GAT 
AGA AGA GGA CCQ CCG CCT TGA GGG AGG GGT GGA AAC TGG GTG CCG GCT 
CCG CGC GCG ACC TCC GGC CCT GCG CGT GCG CCG TGG CGC GGC CCG GCT 
GAC AGG TTC TTT AAT GGA GGA GCC AAT CTC TCT GCA CAC CTG GTT TCA 
TCT AAT AAT ATA CAG ACA CCA GCT CTG AGG CCA GTT AAT CAT CCC CAG 
TGT CCA GGC ACA GAG TAG TCG GTC CGC CTC ACA ATG TTG GAC TTT CTA 
GCC GAG AAC AAC CTC TGT GGC CAA GCA ATC CTA AGG ATT GTT TCC TGT 
GGT AAT GCC ATC ATT GCT GAA CTT TTG AGA CTC TCT GAG TTT ATT CCT 
GCT GTG TTC AGG TTA AAA GAC AGA GCT GAT CAA CAG AAA TAT GGA GAT 
ATC ATA TTT GAT TTC AGC TAT TTT AAG GGT CCA GAA TTA TGG GAA AGC 
AAA CTG GAT GCT AAG CCA GAG CTA CAG GAT TTA GAT GAA GAA TTT COT 
GAA AAC AAC ATA GAA ATT GTG ACC AGA TTT TAT TTA GCA TTT CAA AGT 
GTA CAT AAA TAT ATT GTA GAC TTA AAC AGA TAT CTA GAT GAT CTC AAT 
GAA GGG GTT TAT ATT CAG CAA ACC TTA GAA ACT GTG CTT CTC AAT GAA 
GAT GGA AAA CAA CTT CTA TOT GAA GCA CTG TAC TTA TAT GGA GTT ATG 
CTA CTG GTC ATT GAC CAA AAG ATT GAA GGA GAA GTC AGA GAG AGG ATG 
CTG GTT TCT TAC TAC CGA TAC AGT GCT GCT CGA TCT TCT GCT GAT TCA 
AAT ATG GAC GAT ATT TGT AAG CTG CTT CGA AGT ACA GGT TAT TCT AGC 
CAA CCA GGT GCC AAA AGA CCA TCC AAC TAT CCC GAG AGC TAT TTC CAG 
AGA GTG CCT ATC AAC GAA TCC TTC ATC AGT ATG GTC ATT GGT CGA CTG 
AGA TCT GAT GAT ATT TAC AAC CAG GTC TCA GCG TAT CCT TTG CCG GAG 
CAT CGC AGC ACA GCC CTG GCA AAC CAA GCT GCC ATG CTG TAC GTG ATT 
CTC TAC TTT GAG CCT TCC ATC CTT CAC ACC CAT CAA GCA AAA ATG AGA 
GAG ATA GTG GAT AAA TAC TTT CCA GAT AAT TGG GTA ATT AGT ATT TAC 
ATG GGG ATC ACA GTT AAT CTA GTA GAT GCT TGG GAA CCT TAC AAA GCT 
GCA AAA ACT GCT TTA AAT AAT ACC CTG GAC CTT TCA AAT GTC AGA GAA 
CAG GCA AGC AGA TAT GCT ACT GTC AGT GAA AGA GTG CAT GCT CAA GTG 
CAG CAA TTT CTA AAA GAA GGT TAT TTA AGG GAG GAG ATG GTT CTG GAC 
AAT ATC CCA AAG CTT CTG AAC TGC CTG AGA GAC TGC AAT GTT GCC ATC 
CGA TGG CTG ATG CTT CAT ACA GCA GAC TCA GCC TGT GAC CCA AAC AAC 
AAA CGC CTT CGT CAA ATC AAG GAC CAG ATT CTA ACA GAC TCT CGG TAC 
AAT CCC AGG ATC CTC TTC CAG CTG CTG TTA GAT ACT GCA CAA TTT GAG 
TTT ATA CTC AAA GAG ATG TTC AAG CAA ATG CTT TCA GAA AAG CAA ACC 
AAA TGG GAG CAT TAC AAG AAA GAG GGT TCG GAG CGG ATG ACT GAG CTT 
GCT GAT GTC TTT TCA GGA GTG AAA CCC CTA ACC AGA OTO GAG AAA AAT 
GAA AAC CTT CAA GCT TGG TTC AGA GAG ATC TCA AAA CAA ATA TTG TCT 
TTA AAT TAT GAT GAT TCT ACT GCT GCG GGC AGA AAA ACT GTA CAA CTG 
ATA CAA GCT TTG GAA GAG GTT CAA GAA TTC CAC CAG TTG GAA TCC AAT 
CTG CAA GTA TGT CAG TTT CTT GCC GAT ACT CGA AAG TTT CTT CAT CAA 
ATG ATC AGA ACC ATT AAC ATT AAA GAG GAG GTT CTG ATC ACA ATG CAG 
ATC GTT GGG GAC CTT TCT TTC GCT TGG CAG TTG ATT GAC AGT TTC ACA 
TCC ATC ATG CAA GAA AGC ATA AGG GTA AAT CCA TCC ATG GTT ACT AAA 
CTC AGA GCT ACC TTC CTA AAG CTT GCC TCT GCC CTC GAT CTG CCC CTT 
CTT CGT ATT AAT CAG GCA AAT CGC CCC GAC CTG CTC AGC GTG TCA CAG 
TAC TAT TCT GGA GAG TTG GTA TCC TAT GTG AGA AAA GTT TTG CAG ATC 
ATC CCA GAA AGC ATG TTT ACA TCT CTT CTA AAG ATC ATA AAG CTT CAG 
ACC CAC GAC ATT ATT GAA GTG CCT ACC CGC CTG GAC AAA GAC AAG CTG 
AGG GAC TAT GCT CAG CTA GGC CCA CGA TAC GAG GTT GCC AAG CTT ACT 
CAT GCT ATT TCC ATT TTT ACT GAA GGC ATC TTA ATG ATG AAA ACG ACT 
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Figure 13fBt 



TTG 


GTT 


GGC 


ATC 


ATC 


AAG 


GTG 


GAT 


ATA 


AGG 


AAA 


GAG 


CTT 


GTG 


AAG 


CGC 


CTG 


ATA 


TTC 


AAC 


CCT 


CGA 


GCC 


AAG 


^ ^ ^ 

AAA 


GAG 


TTG 


GGA 


GCG 


ACC 


ATG 


GAT 


ATA 


CAG 


GAC 


TAT 


GTC 


AAC 


ATT 


TAT 


GTA 


TCT 


CGT 


ATC 


ATA 


AAT 


TAC 


AAC 


CTA 


AGA 


ACG 


AAG 


ATT 


CAA 


GAT 


TGG 


ATT 


CCA 


ATA 


CCC 


AAG 


TTT 


ACC 


CCT 


GGT 


CGA 


CTC 


TGC 


AGA 


GAA 


ATC 


CTG 


TGT 


CAC 


ATA 


GAC 


CAG 


CTG 


AAC 


ACT 


GAA 


GTG 


ACC 


AGC 


AGC 


CGC 


CTC 


TTC 


ACC 


TTT 


GGT 


CTA 


AAT 


GGC 


TTA 


GAC 


AAA 


GAG 


TTA 


CAG 


AAT 


TTC 


CTC 


AGT 


GAC 


AGA 


ACT 


GTT 


CAG 


GAC 


ACT 


TTA 


ccc 


CTA 


AAA AGT 


ATT 


GTC 


GCA 


AAT 


ATT 


GCC 


AAA ACA 


CAG 


AAG 


ATT 


TGG 


AAG 


GTT 


OGG 


CAG 


ATG 


CAG 


ATT 


CTG 


AAT 


TAT 


TCT 


TGT 


CGG 


TTT 


GAT 


TCT 


AAT 


CTC 


AAT 


AAG 


GCT 


CTC 


CTA 


GCA 


CCT 


TCA 


CTT 


CCT 


TAC 


CCC 


AAA 


GAA 


ACA 


GCC 


TAT 


CTG 


GAG 


GCA 


GCT 


GGC 


TAC 


ATA 


ACA 


ACA 


AAG 


CGC 


TTA 


CCC 


TTT 


TTG 


ATC 


GCT 


CAG 


TTG 


CCA 


AAA 


ATG 


GTC 


TGC 


CGA 


AAA 


ceo 


ACC 


GAC 


CTG 


GGA 


CTG 


CTC 


ACT 


CTG 


CTG 


AAG 


CAG 


CTC 


CTG 


GCG 


CTG ATT 


GGC 


CAG 


TGT 


ACA 


AGC 


CAG 


AAG 


ATA 


CCT 


GAA 


CTT 


CTG 


TTC 


CTG 


GAG 


GAT 


TAT 




GTT 


GCT 


GAA 


GCA 


CAT 


GTG 


CCT 


AAT 


GTG 


CTG 


TAA 


CTG 


TTT 


TTC 


CTA 


CTT 


ATC 


TTC 


CCA 


CCA 


TCA 


CAA 


ATG 


AAT 


GCT 


CAT 


ACA 


ACT 


GCA 


TTT 


TTT 


CTG 


TAT 


GAG 


TAA 


GAT 


ATA 


TCT 


CAT 


GGC 


TTA 


AAT 


CAT 


GGT 


ATT 


ACA 


TGC 


AAT 


ATT 


TTT 


GTA 


CTG 


CCT 


CTC 


TTA 


AAT 


ATC 


CAT 


TTA 


GTT 


TTA TGT 


TCT 


AAA 


TTC 


AGT 


AAA 


ATA 


GTA 


TTA 


CTA 


GT 



CCA AAG CAG TTG CTG GAA GAT GGA 
GTT GCC TTT GCC CTG CAT AGG GGA 
CCA AGT GAA TTG ATG CCC AAG CTG 
GGA TTC CAT CGT TCT TTT GAA TAC 
GGT CTG AAG ATT TGG CAG GAA GAA 
GTG GAG CAA GAG TGT AAT AAC TTT 
CAA AGC ATG TAC CAG TCC ACT CAT 
GTG GAT GAG TCT GTA ACG TTT ATT 
CGG ATC ACA GAC CCA AAA ATG ACA 
TGG TAT GAT ATG AAA ACT CAT CAG 
TCA GAA ATC CAG ACC ACC TTG GGA 
AGG CTT CTG TGC TTT ATG ATT GTA 
ATG TTT CAG AAA ATT ATC CTG AGA 
AAA ACC CTC ATG AAT GCT GTC AGT 
TCA AAT AAA ATT TAT TTT TCC GCC 
ACT GCG TAT CTC GAG GCT ATA ATG 
AGG CAA CAG ATT GCC AAT GAA TTA 
AAA CAT CTG GCA GCT GCT CTG GAG 
GAC ATT GAA GCC CAC TAT CAG GAC 
GAT AAC ACA CTT TTA TAT GAA ATC 
ATT CAC AAC CCA CTG AAT AAG ATA 
TAT TTT CCA ATT GTA AAC TTT CTA 
CTT CAA TAC AAC AAA AAT CTG GGA 
CCG GTT GAT TGG CCA CCA CTT GTC 
CAG TTC CAT TCC CGG TAC ACC GAG 
TTT ATC TGC TCC ACG GTG GAG CAG 
ATT CCT OCA GAT GTT GTG GGT GCC 
CGG TAC ACA AAG CTA CCC AGG AGG 
TTC ATT TTT GAT GAG TTC AGA ACA 
CTT CAA TGG AAG GAT TGT CCT TAG 
TTG AAG ATG AAA AGA AAC TCA GTT 
TCT ATT ATG GGA AAC ATC AGA CGT 
ATT AGT TAA TAT AAC TGA TAT TGT 
TTA TAT CAG ATA AAA GCA GAA CAC 
GCT GAA TGT AAC TGT TAT GTA TAA 
GAA CTA TTT GTG CAA CTC CAG ATT 
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Figure 14fA> 



Arg Gly Gly Ser Arg Gly Leu Thr 
Arg Arg Gly Pro Pro Pro * Gly 
Pro Arg Ala Thr Ser Gly Pro Ala 
Asp Arg Phe Phe Asn Gly Gly Ala 
Ser Asn Asn lie Gin Thr Pro Ala 
Cys Pro Gly Thr Glu ♦ Ser Val 
Ala Glu Asn Asn Leu Cys Gly Gin 
Gly Asn Ala lie He Ala Glu Leu 
Ala Val Phe Arg Leu Lys Asp Arg 
He He Phe Asp Phe Ser Tyr Phe 
Lys Leu Asp Ala Lys Pro Glu Leu 
Glu Asn Asn He Glu He Val Thr 
Val His Lys Tyr He Val Asp Leu 
Glu Gly Val Tyr He Gin Gin Thr 
Asp Gly Lys Gin Leu Leu Cys Glu 
lieu Leu Val He Asp Gin Lys He 
Leu Val Ser Tyr Tyr Arg Tyr Ser 
Asn Met Asp Asp He Cys Lys Leu 
Gin Pro Gly Ala Lys Arg Pro Ser 
Arg Val Pro He Asn Glu Ser Phe 
Arg Ser Asp Asp He Tyr Asn Gin 
His Arg Ser Thr Ala Leu Ala' Asn 
Leu Tyr Phe Glu Pro Ser He Leu 
Glu He Val Asp Lys Tyr Phe Pro 
Met Gly He Thr Val Asn Leu Val 
Ala Lys Thr Ala Leu Asn Asn Thr 
Gin Ala Ser Arg Tyr Ala Thr Val 
Gin Gin Phe Leu Lys Glu Gly Tyr 
Asn He Pro Lys Leu Ijeu Asn Cys 
Arg Trp Leu Met Leu His Thr Ala 
Lys Arg Leu Arg Gin He Lys Asp 
Asn Pro Arg He Leu Phe Gln'^Leu 
Phe He Leu Lys Glu Met Phe Lys 
Lys Trp Glu His Tyr Lys Lys Glu 
Ala Asp Val Phe Ser Gly Val Lys 
Glu Asn Leu Gin Ala Trp Phe Arg 
Leu Asn Tyr Asp Asp Ser Thr Ala 
He Gin Ala Leu Glu Glu Val Gin 
Leu Gin Val Cys Gin Phe Leu Ala 
Met He Arg Thr He Asn He Lys 
He Val Gly Asp Leu Ser Phe Ala 
Ser He Met Gin Glu Ser He 'Arg 
Leu Arg Ala Thr Phe Leu Lys Leu 
Leu Arg He Asn Gin Ala Asn Arg 
Tyr Tyr Ser Gly Glu Leu Val Ser 
He Pro Glu Ser Met Phe Thr Ser 
Thr His Asp He He Glu Val Pro 
Arg Asp Tyr Ala Gin Leu Gly Pro 
His Ala He Ser He Phe Thr Glu 
Leu Val Gly He He Lys Val Asp 
He Arg Lys Glu Leu Val Lys Arg 



Arg Ser Arg Ser Gly Thr Ala Asp 
Arg Gly Gly Asn Trp Val Pro Ala 
Arg Ala Pro Trp Arg Gly Pro Ala 
Asn Leu Ser Ala His Leu Val Ser 
lieu Arg Pro Val Asn His Pro Gin 
Arg Leu Thr Met Leu Asp Phe Leu 
Ala He Leu Arg He Val Ser Cys 
Leu Arg Leu Ser Glu Phe He Pro 
Ala Asp Gin Gin Lys Tyr Gly Asp 
Lys Gly Pro Glu Leu Trp Glu Ser 
Gin Asp Leu Asp Glu Glu Phe Arg 
Arg Phe Tyr Leu Ala Phe Gin Ser 
Asn Arg Tyr Leu Asp Asp Leu Asn 
Leu Glu Thr Val Leu Leu Asn Glu 
Ala Leu Tyr Leu Tyr Gly Val Met 
Glu Gly Glu Val Arg Glu Arg Met 
Ala Ala Arg Ser Ser Ala Asp Ser 
Leu Arg Ser Thr Gly Tyr Ser Ser 
Asn Tyr Pro Glu Ser Tyr Phe Gin 
He Ser Met Val He Gly Arg Leu 
Val Ser Ala Tyr Pro Leu Pro Glu 
Gin Ala Ala Met Leu Tyr Val He 
His Thr His Gin Ala Lys Met Arg 
Asp Asn Trp Val He Ser He Tyr 
Asp Ala Trp Glu Pro Tyr Lys Ala 
lieu Asp Leu Ser Asn Val Arg Glu 
Ser Glu Arg Val His Ala Gin Val 
Leu Arg Glu Glu Met Val Leu Asp 
Leu Arg Asp Cys Asn Val Ala He 
Asp Ser Ala Cys Asp Pro Asn Asn 
Gin He Leu Thr Asp Ser Arg Tyr 
Leu Leu Asp Thr Ala Gin Phe Glu 
Gin Met Leu Ser Glu Lys Gin Thr 
Gly Ser Glu Arg Met Thr Glu Leu 
Pro Leu Thr Arg Val Glu Lys Asn 
Glu He Ser Lys Gin He Leu Ser 
Ala Gly Arg Lys Thr Val Gin Leu 
Glu Phe His Gin Leu Glu Ser Asn 
Asp Thr Arg Lys Phe Leu His Gin 
Glu Glu Val Leu He Thr Met Gin 
Trp Gin Leu He Asp Ser Phe Thr 
Val Asn Pro Ser Net Val Thr Lys 
Ala Ser Ala Leu Asp Leu Pro Leu 
Pro Asp Leu Leu Ser Val Ser Gin 
Tyr Val Arg Lys Val Leu Gin He 
Leu Leu Lys He He Lys Leu Gin 
Thr Arg I^eu Asp Lys Asp Lys Leu 
Arg Tyr Glu Val Ala Lys Leu Thr 
Gly He Leu Met Met Lys Thr Thr 
Pro Lys Gin Leu hexx Glu Asp Gly 
Val Ala Phe Ala Leu His Arg Qly 
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Figure 14fB) 



Leu He Phe Asn Pro Arg Ala Lys Pro Ser Glu Leu Met Pro Lys Leu 
Lys Glu Leu Gly Ala Thr Met Asp Gly Phe His Arg Ser Phe Glu Tyr 
He Gin Asp Tyr Val Asn He Tyr Gly Leu Lys He Trp Gin Glu Glu 
Val Ser Arg He He Asn Tyr Asn Val Glu Gin Glu Cys Asn Asn Phe 
Leu Arg Thr Lys He Gin Asp Trp Gin Ser Met Tyr Gin Ser Thr His 
He Pro He Pro Lys Phe Thr Pro Val Asp Glu Ser Val Thr Phe He 
Gly Arg Leu Cys Arg Glu He Leu Arg He Thr Asp Pro Lys Met Thr 
Cys His He Asp Gin Leu Asn Thr Trp Tyr Asp Met Lys Thr His Gin 
Glu Val Thr Ser Ser Arg Leu Phe Ser Glu He Gin Thr Thr Leu Gly 
Thr Phe Gly Leu Asn Gly Leu Asp Arg Leu Leu Cys Phe Met He Val 
Lys Glu Leu Gin Asn Phe Leu Ser Met Phe Gin Lys He He Leu Arg 
Asp Arg Thr Val Gin Asp Thr Leu Lys Thr Leu Met Asn Ala Val Ser 
Pro Leu Lys Ser He Val Ala Asn Ser Asn Lys He Tyr Phe Ser Ala 
He Ala Lys Thr Gin Lys He Trp Thr Ala Tyr Leu Glu Ala He Met 
Lys Val Gly Gin Met Qln He Leu Arg Gin Gin He Ala Asn Glu Leu 
Asn Tyr Ser Cys Arg Phe Asp Ser Lys His Leu Ala Ala Ala Leu Glu 
Asn Leu Asn Lys Ala Leu Leu Ala Asp He Glu Ala His Tyr Gin Asp 
Pro Ser Leu Pro Tyr Pro Lys Glu Asp Asn Thr Leu Leu Tyr Glu He 
Thr Ala Tyr Leu Glu Ala Ala Gly He His Asn Pro Leu Asn Lys He 
Tyr He Thr Thr Lys Arg Leu Pro Tyr Phe Pro He Val Asn Phe Leu 
Phe Leu He Ala Gin Leu Pro Lys Leu Gin Tyr Asn Lys Asn Leu Gly 
Met Val Cys Arg Lys Pro Thr Asp Pro Val Asp Trp Pro Pro Leu Val 
Leu Gly Leu Leu Thr Leu Leu Lys Gin Phe His Ser Arg Tyr Thr Glu 
Gin Leu Leu Ala Leu He Gly Gin Phe He Cys Ser Thr Val Glu Gin 
Cys Thr Ser Gin Lys He Pro Glu He Pro Ala Asp Val Val Qly Ala 
Leu Leu Phe Leu Glu Asp Tyr Val Arg Tyr Thr Lys Leu Pro Arg Arg 
Val Ala Glu Ala His Val Pro Asn Phe He Phe Asp Glu Phe Arg Thr 
Val Leu * Leu Phe Phe Leu Leu Leu Gin Trp Lys Asp Cys Pro ♦ 
He Phe Pro Pro Ser Gin Met Asn Leu Lys Met Lys Arg Asn Ser Val 
Ala His Thr Thr Ala Phe Phe Leu Ser He Met Gly Asn He Arg Arg 
Tyr Glu * Asp He Ser His Gly He Ser ♦ Tyr Asn * Tyr Cys 
Leu Asn His Gly He Thr Cys Asn Leu Tyr Gin He Lys Ala Glu His 
He Phe Val Leu Pro Leu Leu Asn Ala Glu Cys Asn Cys Tyr Val ♦ 
He His Leu Val Leu Cys Ser Lys Glu Leu Phe Val Gin Leu Gin He 
Phe Ser Lys He Val Leu Leu 
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Figure IS 



•f strand (sense) 



2. pchl3-sp6-2f 



sequence (5<— >3') 



1st base 
1. pchl3-sp6-lf 370 



726 



TTT ACT TCT AAC GCT TAT TC 
TGA AGG ACT CCT TTG AGA CG 



3. T7.1 

4. T7.2 

5. T7.3 

6. T7.4 



1140 
1361 



1602 



2041 



7. chl3-2480 2486 
- strand (antisense) 



8. SP6.1 

9. SP6.2 

10. SP6.3 

11. SP6.4 

12. pchl3-t7-lf 

13. pchl3-t7-l£a 

14. pchl3-t7-2fa 

15. CH13-AS-1 



2746 

2490 

2213 

1812 

1165 

712 

286 

536 



TCA CAA TGG GCT ACT GG 

TTC AAC GAG GGA GAT GG 
TTA GCA CCA CTG AGA GA 
GTT CTT TTA GGC ATT TA 
GCT GCG TCT GTT CGT CAG C 



CCT CTG CTT CAC AAC AT 

GCA GtA GGG CGG ACA CC 
(C) 

AGG GTC TTC TTC ATT GT 
GGA TTG TCT TTG TCT CT 
AGT GCA CTT CCA TGG GCG TG 
CCT TCA TCA GGT TGA CGA AC 
GCG GCA ATC AGA AAC GGA AG 
TGA ACA CGT GGT ACA T 



wo 97/38085 



Figure 16(A> 



PCTAJS97/05930 



1 


CTTCrCTGAG 


51 


CTCTGCCCTT 


101 


AAATGAAGGC 


151 


AGAAGCATCC 


201 


CAGTAGGTGG 


251 


GTCGTCGTCC 


301 


GCCTCTAGAC 


-351 


ACCAGTCTCT 


401 


TATAGGAAAC 


451 


TTTAACAGCA 


501 


GAGTGCCGGA 


551 


GGGCAGCASG 


601 


AACAGCGATC 


651 


TGTTGGACTT 


701 


AAGAATGAGC 


751 


CAACAAGAGA 


801 


CAAAGTTAAG 


851 


ACGTTGGACA 


901 


CTTTGAAGCA 


951 


AAAGTGCCTC 


1001 


GAGTGCGGTG 


1051 


GGAGCTTTCG 


1101 


AGAGTGACTC 


1151 


TACTGGCCAA 


1201 


TAAACnCAG 


1251 


GAAAACTTCA 


1301 


TTTAAAGAAG 


1351 


GCTCCTCATG 


1401 


TGGCCACGGG 


1451 


GCCTGTGGCA 


1501 


GGAAGATGGA 


1551 


TTAGAATAAA 


1601 


GTTAGCACCA 


1651 


TGCTATCGTC 


1701 


TACriTlX-riGA 


1751 


TTGAAAAAGA 


1801 


CAAAGACAAT 


1851 


GGTTCCCCTT 


1901 


CTGTGCCATT 


1951 


GAAGGAAGGG 


2001 


AACCTQCAGA 


2051 


GCATTTAAAT 


2101 


AAGAAGATGT 


2151 


CAAAAAGCTG 


2201 


TGAAGAAGAC 


2251 


ATCCCTGAAG 


2301 


GGCCCriXJVT 


2351 


TGGATGACX^ 


2401 


AAGTTGTAAA 


2451 


GCTCCCTGAG 


2501 


CAGCTGAGTT 



CCCTTTCTGC CTGTGTAGGA 
CTCCGTAAGA TGGTCCATTA 
TTGGGAAGAT GGCIAAAATC 
CTGCTTCCCT GGGCCCGCCC 
TTTTTAGAAA GGGCTTCCTT 
GTTTGCATGA GGAAATGTTC 
TGCATCTGTC ATAGACAAAT 
TCTTTAAACT TTACTTCTAA 
CACTGATTCC TTGTGTGGAG 
ATTCTGCAGA AAGGGCTCGA 
CCTCGCACAG ATGTACX:AGC 
CGCTGCTGCA GCACTGGAGC 
GTAATCAATC CTGAGAAAGA 
CAAGGACAAG GTGGACCACG 
GGTTCGTCAA CXTTGATGAAG 
CXrCAACAAGC CTCCAGAACT 
AGCAGGCAAC AAAGAAGCCA 
AGATCATGAT CCTGTTCAGG 
TTTTATAAAA AAGATTTGGC 
AGTOGATGCT GAAAAGTCTA 
CAGCCTTCAC CAGCAAGCTG 
AAGGACATCA TCX?ITCATTT 
AGGCCCTATA GACCTCACAG 
CATACACGCC CATGGAAGTG 
GAAGTATTTA AGGCAnTTA 
GTGGCAAACT ACTTTCGGAC 
GGAAGAAGGA ATTCCAGGTG 
TTCAACGAGG GAGATGGCTT 
GATAGAGGAT AGTGAATTGC 
AAGCACGTGT GCTGATTAAA 
GACAAGTTCA TmTAATGG 
GATCAATCAA ATTCAGATGA 
CTGAGAGAGT GTTTCAGGAT 
AGAATAATGA AGATGAGAAA 
ATTATATAAT CAGCTGAAAT 
GAATTGAATC TCTGATAGAC 
CCX3AATCAGT ACCACTACGT 
CATGAAACAC TAGAATGTAC 
TCTGGGACTC TGATTGATCC 
AGGTGGCTCC TOGGTCATCT 
TGTATCTTTT TCCCTCCAGT 
TGTTTCTGTr ACTCTGTGCA 
TACTAAAGAG AAGTrCCTTT 
CAA GITIU TT TTCTTCTCGT 
CXTTAGATGCT GCATimTA 
ACAGCTCGCT CAGATGATCA 
GGGTGAACAT TAGAAAGAGC 
ACCCTA GCCA CTQGCOCCTC 
CirrGGTOGC TGATTTTTCG 
GTGCCAAGGC CATCGTGTCC 
CXrTTGTGAAT CTCTGTTTTA 



AGCAGAAGGC GGAATGTCGG 
AAACGTTCCT TATAAACTGG 
AGCAATCCTT GGAATAACGC 
GTGGGCCTGC TrGTGCTGTT 
CAGCX3TCATT AGCAACAGGA 
TTAACCTTCC GTTTCTGATT 
GCCCCCATCT TTTACAGAGA 
CGCTTATTCT TTTTACCTTA 
AAACAGCTAT TAGGAGAACA 
CCACTTACTG GATGAGAACA 
TGITCAGCCG GGTGAGGGGC 
GAGTACATCA AGACTTTTGG 
CAAAGACATG GTCCAAGACC 
TGATCGAGGT CTGCTTCCAG 
GAGTCCTTTG AGACGTTCAT 
GATCGCAAAG CATGTGGATT 
CAGACGAGGA GCTGGAGCGG 
TTTATCCACG GTAAAGATGT 
AAAAAGACTC CTTGTTGGGA 
TCTTGTCAAA GCTCAAGCAT 
GAAGGCATGT TCAAGGACAT 
CAAGCAGCAT ATGCAGAATC 
TCAACATACT CACAATGGGC 
CACTTAACCC CAGAAATGAT 
TCTTCGAAAG CACAGTGGTC 

ATGCTGnrr aaaagcggag 

TCCCTCTTCC AGACACTGGT 
CAGCTTTGAG GAGATAAAAA 
GCAGAACGCT GCAGTCCCTG 
AGTCCCAAAG GAAAGGAAGT 
AGAGTTCAAG CACAAGTTGT 
AGGAAACTGT TGAGGAACAG 
AGACAATATC AGATTGATGC 
GACTCTTGGT CATAATCTTC 
TTCCAGTAAA GCCTGGAGAT 
AGAGACTATA TGGAGAGAGA 
GGCXTTGACGC ATCTGCAGAC 
CCTCAGAGCA GGAAGCACAC 
AGCTGTGGAC ATTGGAAGGC 
TTCACAAGGC TCAAGACTTC 
TTTTCCTCTA GTTCTTTTAG 
AAATAACTTT GAGATTGGAC 
AAAAGGTCTT GTTCTTGTGT 
GTGTGATCAT GAGTGCACAA 
GCTCTGAAGA TTCCTTAGGT 
GCATTTAGAG TGAAAACAAG 
CAGGGTTCAA AGCTGGCGAA 
CCrcnTCAT GTATTTCCAA 
TAAGTCAGGT TTCTAAGTGA 
GCCCTGCTGC GTCTGTTCGT 
GGGGTTGGGG CTAGTGTGTT 
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Figure iS(B) 



2551 


TGTGTProCA 


2601 


GGGTAACTCC 


2651 


TAATAAAGTT 


2701 


TCTCTGCTCT 


2751 


GTCAAGCAGA 


2801 




2851 


TCAGTAGTGA 


2901 


CATTTAAAAG 


2951 


AAAGCTACCA 


3001 


TTCTGGAATA 


3051 


GAGTCicrrr 


3101 


CCATATTAAA 


3151 


TCATTTATGA 


3201 


TA'lTmiTT 


3251 


ATTTAATGTA 


3301 


TACATTAATA 



TTCTAAGATT GAGTCTGGCA 
TCTTTGATIT TTTrTAATTG 
TGGTTTCGTT TTTACAGTCA 
AAACTGTAAA AAGTITATCG 
GGTTATTTTC3 TGGAAAGATT 
TTCTGTATAT ATACATCAGG 
TGTTAGAAGG GTAACTATGA 
TACTTTATAT TTTACATAAT 
AAGGAATTTT GATCATGGCA 
TACXAAGTTT ATATAATTTG 
TTCAAACATG aXSGTTTGAA 

ATCCTCACTC TTTAATTCTC 
GTTCCATGAT ATCTGGTCTA 
TCTTATAAGT TCGTTX3TCTC 
GACTTACTTT GAATAAAATT 
AAACTTTGTG ATATGCAAAT 



GTXXCTGTTT TnTGCATTC 
CAGTATTTCT GTGATTGCAA 
TCCGCAGGGA CGATCCITCT 
AGACCTAAAG TCTTGATC5TT 
AAAAGGATTT TGTTGGTACC 
TTGAACAGTG AAAGGAAAGT 
CAAAGATACT TTTGAGATAA 
AGCATCTTTC ATTTTCATTA 
TAAGTGTTTA AAGCAATATT 
ATTTTCTCCT AAATTATTAA 
ATATOACACC TIGTCGGOTr 
ATTriTATCT TTGAAAATTT 
AGAAAGACCA AACAGATTTC 
T AGAG ATTOT TAATATTGTA 
AGTTTAATTG GCCTTAAAAT 
GACACATTC 
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Figure 17 



1 FPEPFLFV*E AfiGGMSALPF SVRMCUdFL IMWK*KLGKM AKISNFWNNA 

51 EASLLFWARP VZAiCLCTSVGG F*KGLPSASL ATSVWRLHE EKFLTFRF«L 

101 PLDCICKROM PPSFTEl^SL L*TLLLTLIL FTLYRKPLIA CVEKQLLGEH 

151 LTAILQKGLD HLLDEHRVPD lAQM^QLFSR VRGGQQALLQ HWSEYIKITO 

201 TAIVINPEKD KI]MV;QDL£I}F KDfCVDHVXEV CFQXNERFVN LMKESFETFI 

251 NKRPNKPAEL IAKHVD6KLR AGNKEATDEE LERTLDKIMI LFRFIHGfCDV^ 

301 FEAFYKKDLA KRLLVGKSAS VDAEICSMLSK LKHECX^AAFT SKLEIGMFKEM 

351 EliSiCDIMVHF KQHMt»XJSDS GPIDLTVNIL OMGVWPTYTP MEVHLTPEMI 

401 KLQEVFKAFV USKH SGRKLQ WQTHiGHAVL KAEFKEGKKE FQVSLFX2TL.V 

451 IXMFNEXSDSF SFEEIKMATG lEDSELRRTL QSLACGKARV LIKSPfCGKEVT 

501 EDGOKFIFdS EFKHKLFRIK INQIQMKETV EEQVSTTERV FQOERQYQIDA 

551 AIVRIHKKRK TLGKZ^VSE LVNQLKFPVK PGDLKKRIES LIDRDYMERD 

601 KENFNQYHYV A«RICRRFPF MKH*NVPSBQ EAKLCHFWDS D*SSCGHWKA 

651 KEGRWLL6HL SQGSRLQPAD VSFSLQFFL* FF«AFKLFLL LCAK«L«DOT 

701 RRCY*REVPL KGLVLVSKSC KFGLFSCVIM SAQ«RRP*KL HFLAIJCIP*V 

751 SLKTARSDDQ KLE«KQGPFH GEH«KEPGFK AGENMTHPSK WPLPVSCXSK 

801 SCKLHWLIFR KSGF«VSSLR CQGHGVRPAA SVRQLSSL*! SVLGVGASVF 

851 VFPF*D«VWQ SLFFCZGVTA L*FFLIAVFV •LQ««SLVWF LQSCAGTILV 

901 LCCKL*iCWG DLKS«CCEA£ VIUWKD*KDF VGTV/FCWVI YMRLNSERKV 

951 Q-.C*KGMyD KDTFEItFKS TLYFT*«HVS F^LKATKGIL IMA-VFKAIF 

1001 SGIYQVYII- FCAKLLRVSF •NMRV^NMTP CGFPY-NPHS LIVIFIFENF 

1051 HL-VP-YW* ERPNRFIFFF LISSLCLEIV NIVI-CRLTL NKISLIGLKI 

1101 TLIKLCCMQM TH 



MYQLFSR VRGGQQALLQ HWSEYIKTFG 

201 TAIVINPEaCD KEMVQDLLDF KDKVEHVIEV CPQKNERFVN UfiCESFETFI 

251 NKRPKKPAEL lAKHVDSKLR AGMKEATDEE LERTLIKIHI LFRFIHGKDV 

301 FEAFYKKOLA KRLLVGKSAS VDAEKSMLSK LKHBCGAAFT SKLBGMFKEM 

351 ELSKDIMVHF KQHMQNQSDS GPIDLTVNIL TMSYWPTTTP MEVHLTPEMI 

401 KLQEVFKAFY LGKHSGRKLQ waTTLGHAVL KAEFKEGKKE FQVSLFQTLV 

451 LLMFTJBGDGF SFEEIKMATC lEDSELRRTL QSLACGKARV LIKSPKGKEV 

501 EDGDKFIF1I3 EFKHKLFRIK INQIQMKETV EBQVSTTERV FQDRQYQIDA 

551 AIVRIMKMRK TLGHNLLVSE LYNQLKFPVK PGDLKKRIES LIDRDYMERD 

601 KENFNQYKYV A 
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Figure 18 



Strand (sense) sequence (5" — >3 * } 







1st base 














1. 


pchl4-sp6-lf 


686 


GGC TTA 


ACA 


CTC 


AAT 


GTA 


C 


2. 


pchl4-sp6-2f 


1005 


CTA TGA 


AAA 


GAC 


AGC 


TTA 


AG 


3. 


pchl4-SP6-3f 


1315 


ATT TAG 


TTT 


GAA 


AAG 


CAT 


G 


4 


pchl4-sp6-4f 


1589 


CAG ACT 


TTA 


AAG 


TCA 


CAA 


G 


5. 


pchl4-sp6-5f 


1808 


CAA AGA 


CTT 










- 


s trand ( ant isense ) 


sequence 


(5'— >3') 












6. 


pchl4-sp6-6fb 


2020 


GCA GTT 


TAA 


TTT 


GGT 


CCT 


G 


7. 


pchl4-sp6-5fb 


1757 


CTG TAA 


TTA 


TAG 


TTC 


TGT 


C 


8. 


pchl4-sp6-4fb 


1607 


CTT GTG 


ACT 


TTA 


AAG 


TCT 


G 


9. 


pchl4-sp6-3fb 


1339 


ATA ATC 


ATG 


CTT 


TTC 


AAA 


C 


10 


.pchl4-sp6-2rb 


1023 


TTA AGC 


TGT 


CTT 


TTC 


ATA 


G 


11 


.pchl4-sp6-lrb 


704 


GTA CAT 

• 


TGA 


GTG 


TTA 


AAC 


C 


12 


. CH14a 


629 


CGG CAG 


AGC 


TGA 


CTA 


CTG 


GAA 


13 


- CH14b 


644 


CAA GCA 


GGG 


AAG 


TAA 


CGG 


CAG 


14. 


. CH14C 


109 


CTT GTT 
TGG AAG 


AGC 
AG 


TTG 


TTT 


AGA 


AGG 


15. 


> 


90 


GGT GGA 
TCA GGC 


AGA 


GAA 


GGT 


CTC 


CTT 
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Figure 19 



1 


GAAGATGATC 


51 


TGTGCCTGCA 


101 


CTAACAAGAA 


151 


ACAAAAACAA 


201 


TGCTCCXAGA 


251 


AGGGACAAAG 


301 


ACAAAAGGAG 


351 


TGTGGCACAG 


401 


CTTGTAAAAA 


451 


AAAGCCTTCC 


501 


AAAnUiAAA 


551 


ATCiXiAGTAG 


601 


GCACCACXnT 


651 


GATGGAATGT 


701 


GTACAAGTCC 


751 


CX3ACATGCCT 


801 


CXrTGCCTGGC 


851 


AGATACIXITA 


901 


TCATAATATG 


951 


AGTTTGTAAG 


1001 


TTTACTATGA 


1051 


TCGGGCATGT 


1101 


ATCATGGTTA 


1151 


TGAGTGGAGA 


1201 


ACnrCACTT 


1251 


CAGCATTGGC 


1301 


TTAGTITTTA 


1351 


GAGGCTGAGT 


1401 


CAGGATGAAT 


1451 


GCAGAAAATA 


1501 


ATGCCTTCTA 


1551 


AAA'rrrrrrr 


1601 


TCACAAGATT 


1651 


TTCTCAGAAT 


1701 


GAAATGTAAA 


1751 


ATTACAGAGA 


1801 


CmTGGCCT 


1851 


AACTGTTAAG 


1901 


CTGATTTCAA 


1951 


GGTTAGAAAA 


2001 


TCAGGACCAA 



ATTACGGGTC TCGAACAGGA 
AAGCCTGAAA GGAGACCTTC 
TCTGATTTTG AAGGCTATAT 
CTAACTACTC TACAGTTCCA 
ACTCGAACTT CTCAAGAAGA 
TAGGACXXXrC AGAATAAGTC 
ATTCTCTAGA AAAAAATCAA 
AAACXAGAAA AACTTTTGGA 
TGGGGATGAG TGTGCCTACC 
CCAATTGTAA ATTTGCTGAA 
TATGATGCAA AGTGTACTAA 
AAGAATTCCA GTACTGTCTC 
CCAGTAGTCA GCTCTGCCGT 
CCCrrCTATC ATCCAAAACA 
GGACTGCACA TTCTACCATC 
TGAAATGGAT TCGACCTCAA 
AGAAGATCAT GCAGTTTGGA 
CAGAACTTGT CAAATCTTTG 
AAGTTTTATT GCCTATCTAT 
TTTATTATGr GG'l'l'i'iAACA 
AAAGACAGCT TAAGGAAGAG 
TTGTGCACTG CTGTTGTGAG 
GTCATGGTAC TGCAGCTTAG 
GATGCAGTGA GGCAGTTGTC 
TTCCCAAAGA TTATATAATG 
CAAAGGTACT GAGGCTGCTT 
AGTGAATTTA GTTTGAAAAG 
GCTACTTTCG GTAAAGTTCC 
GAGGTGGGTA TGGACAGTGG 
GGAACAGTk: TATACAGTGC 
AATAATTTIT TTGGGAAACT 
ACAAGTATTT ACATACTGTA 
ATAAATGTAC ATATGTATTC 
CCACAGAAAA TATACTTAGT 
AATTAGATTT AAATAGTATA 
TCAGATCAGA TAGGTAAACT 
ACrGTATTAC TTACAGAGTT 
GCAAGAAGTG TCAAATGCTT 
AGACTTGGTG TATAGTGTTA 
GTGGATTAAT GCAAAAGGGG 
ATTAAACT^C T 



AGCATCTCCA GCAGTGTGTC 
TCTTCCACCT TCTAAACAAG 
CTGAAGCTCA AGAATCCGTA 
CAGAAACAGA CACTTCCAGT 
ATTGCTAGCA GAAGTGGTCC 
CXXXXATTAA AGAAGAGGAA 
GCTGAGATGA GTGAACTGAG 
GCX3CTGCAAG TACTGGCCTG 
ATCACCCCAT CTCACCCTGC 
AAATCTTTOT TTGTTCACCC 
ACCAGATTGT CCCTTCACTC 
CAAAACCAGT TGCACCACCA* 
TACTICCCTG CTTGTAAGAA 
TTGTAGGTTT AACACTCAAT 
CCACCATTAA TGTCCCACCA 
ACCAGCGAAT AGCACCCAGT 
AGTITTCATG TACTGATGAA 
AAACTTGGAA TATATTGCTT 
CTGAAGTGTC TAATnTTCA 
TTGGGTGTTT TrGrXTIVlT 
CTAAATTCTC TTAAAATATT 
GATCAGCATA TGAAATTGAC 
GGGGCTACAC GGTTGCTGTG 
ATTATTCTAA AAATTGTACT 
TTCATAATCC ACCATGAAAA 
AAAATATTCA ATTCTGCTTT 
CATGATTATA CAGGCCTCTC 
AGTTTTCCAG CCTTCTGTGA 
AGGCAGCTGG AATGGCAAGT 
TCTCATTTAC TAATAACATA 
ACATTATCAC AAAAITATAC 
TCTGAAAACA GACTTTAAAG 
TCACATTCTG AAAAATAACA 
TACTACTGAA GATAATTTTT 
TTTTAAATGA CAGAACTATA 
GCAAGATAGA TAGGATGAAA 
TTTTTGTGTG TGCrTTTTAA 
TAGAGITAAA TAACAGATCA 
AAAATTAAAG CTTAAAAGGT 
TAATAAAGAC TGCAACATTC 
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Figure 20 



1 EDDDY GSRTC SISSSVSVPA KPERRPSLPP SKQANKNLIL KAISEAQESV 

51 TKTTOYSTVP QKQTLPVAPR TRTSQEELLA EWQGQSRTP RISPPIKEEE 

101 TKGDSVEKNQ AE21SELSVAQ KPEKLLERCK YWPACKNGDE CAYHHPISPC 

151 KAFPNCKFAE KCLFVHPNCK YnAKCTKPDC PFTHVSRRIP VLSPKPVAPP 

201 APPSSSQLCR YFPACKKMEC PFYHPKHCRF NTQCTSPDCT FYHPTINVPP 

251 RHALKWIRPQ TSE-RPVLPG RRSCSLEVFM Y««KILYRTC QIFETONILL 

301 S^YEVIXPiy LKCLIFQVCK FIMWF^HWVF LFCFYYEKTA •GRAKFC-NI 

351 WSMFV HCCCE DQHMKLTSWL VMVLQLRGLH GCCVSGEMQ- GSCHYSKNCT 

401 TFTFPKOTIM FIIHHENSIG QRY-GCLKYS ILLFSF*VNL V-KA-LYRPL 

451 EAECYFR^SS SFPAFCDRMN EVGMDSGGSW NGKCRK.EQF YTVLSFINNI 

501 MPSK^FFWET TLSQNYTOFF TSIYILYLKT DFKVTRL^MY ICILTF-KIT 

551 FSESTQULS YY-R-FLKCK N-I-IVYKK- QNYNYRDQIR •VNCKIDKMK 

601 LLAYCITYRV FIXnA/FKTVK ARSVKCFRVK -QITDFKDLV YSVKN.SLKG 

651 G^KSGI^QKG ••RLQHSQDQ IKL 



EDDDYGSRTO SISSSVSVPA KPERRPSLPP SKQANKNLIL KAISEAQESV 
TRTINYSTVP QKQTLPVAPR TRTSQEELLA EWQGQSRTP RISPPIKEEE 
TKGDSVEKNQ AQISELSVAQ KPEKLLER CK YWPACKNGDE CAYHH PTfiPr 
KAFPNCKFAE KCLFVHPNCK YDAKCTKPDC PFTHVSRRIP VLSPKP^A^vpp 



APPSSSQ LCR YFPACKKMEC P FYHPKHCRF 
RHAUWIRPQ TSE 



r 



r' 
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Figure 21 



1 AAAACTTTCG GAAGAGAAAG TTGCCTGTGG TAAGTTCAGT TGTTAAAGTA 

51 AAAAAATTCA ATCATGATCG AGAAGAGGAG GAAGAAGATG ATGATTACGG 

101 GTCTCGAACA GGAAGCATCT CCAGCAGTGT GTCTGTGCCT GCAAAGCCTG 

151 AAAGGAGACC TTCTCTTCCA CCTTCTAAAC AAGCTAACAA GAATCTGATT 

201 TTOAAGGCTA TATCTGAAGC TCAAGAATCC GTAACAAAAA CAACTAACTA 

251 CTCTACAGIT CCACAGAAAC AGACACTTCC AGTTGCTCCC AGAACTCGAA 

301 CTTCTCAAGA AGAATTGCTA GCAGAA3TGG TCCAGGGGAC AAAGTAGGAC 

351 CCCCAGAATA AGTCCXTCCCA TTAAAGAAGA GGAAACAAAA GGAGATTCTG 

401 TAGAAAAAAA TCAAGATTAC TATGACATGG AATCCATGGT CCATGCAGAC 

451 ACAAGATCAT TTATTCTGAA GAAGCCAAAG CTGTCTGAGG AAGTANTAGT 

501 GGCACCAAAC CAAGANTCGG GGATGAAGAC TGCAGATICC CTTCGGGTTC 

551 TTTCAGGGAC CCTTATGCAG ACACNAGATC TTGTTCAACC AGATAAACCT 

601 gcaagixxx:a AG 



1 KTFGRESCLW •VQLLK'KNS IMMEKPIIKKM MITGLEQEAS PAVCLCLQSL 

51 KGDLLFHLW KLTRI*F*RL YLKLKNP«QK QLTTLQFHRN RHFQLLPELE 

101 LLKKNC«QKW SRGQSRTPRI SPPIKEEETK GDSVEKNQDY YCMESKV7KAD 

151 TRSFILKKPK LSEEVXVAFN QXSGMKTADS UOJLSGTUIQ TXDLVQPDKP 

201 ASPK 



1 NAGCTGCTCT GACGGGNAGN GGAATGNATG GNGGCTTGTT CNGAAACNNG 

51 CCAGATGGCG NGAGGGGGAC AAGTAGCGGC GTGATTOAGA AGAGGGAGGT 

101 GAGGG™rrC ACATCACCNC ATCTOACCAT GNCGNGCCNT CCCCANTANT 

151 AAl^AMTGATG ATAGNGGGAA GTGGGCCCAC CCAGAAGCNT GATTGAGCGG 

201 CCGCCAGTAN GAAAGNNGTT TGTCCANTTA GNCATACNNA TOGTAGGGTT 

251 Q^AGCNGCX^T CCCCGGCACC NGCANANNNN CNNCNGGGAC NACNGCCCNN 

301 NNNTONGTTA NUCNGNGIIAG MNAAAAAATT CAATCATGAT GGAGAAGAGG 

351 AGGAA GAAGA TCATGATTAC GQGTCTCGAA CAGGAAGCAT CTCCAGCAGT 

401 GTGTCTGTGC CTGCAAA 



Untitled translated in RF 2 

1 SCSDGXXN3CW XLVXKXAHNX EGDK«RRDXE EGGBGXHITX SXHXXXSPXX 

51 XXHZXGSGPT QKXD-AAASX KXVCPXXKXX XKVXXASPAX AXXXXGXXPX 

101 XXLXXXXKKF NHDGF>;Kh^:U DDYGSRTGSI SSSVSVPA 
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Figure 22 



CH1-9a11'2 

GA AAA CAA ATG GAA GAA ATG CAA AAG GCT TTC AAT AAA ACA ATC GTG 
AAA CTT CAG AAT ACT TCA AGA ATA GCA GAG GAG GAG GAT CAG CGG CAA 
ACT GAA GCC ATC CAG TTG CTA CAG GCA CAG CTG ACC AAC ATG ACA CAG 
CTT GTT CAA 

Lys Gin Met Glu Glu Met Gin Lys Ala Phe Asn Lys Thr lie Val Lys 
Leu Gin Asn Thr Ser Arg lie Ala Glu Glu Gin Asp Gin Arg Gin Thr 
Glu Ala lie Gin Leu Leu Gin Ala Gin Leu Thr Asn Met Thr Gin Leu 
Val Gin 



CH8-2a13-1 



GAA CAG 


GCA 


AGC 


AGA 


TAT 


GCT ACT GTC 


AGT 


GAA 


AGA 


GTG 


CAT 


GCT 


CAA 


GTG CAG 


CAA 


TTT 


CTA 


AAA 


GAA GGT TAT 


TTA 


AGG 


GAG 


GAG 


ATG 


GTT 


CTG 


GAG AAT 


ATC 


CCA 


AAG 


CTT 


CTG AAC TGC 


CTG 


AGA 


GAC 


TGC 


AAT 


GTT 


GCC 


ATC CGA 


TGG 


CTG 


ATG 


CTT 


C 
















Glu Gin 


Ala 


Ser 


Arg 


Tyr 


Ala Thr Val 


Ser 


Glu 


Arg 


Val 


His 


Ala 


Gin 


Val Gin 


Gin 


Phe 


Leu 


Lys 


Glu Gly Tyr I^eu 


Arg 


Glu 


Glu 


Met 


Val 


Leu 


Asp Asn 


lie 


Pro 


Lys 


Leu 


Leu .Asn Cys 


Leu 


Arg 


Asp 


Cys 


Asn 


Val 


Ala 


lie Arg Trp 


Leu 


Met 


Leu 


















CH13-2a12'1 






















CTC ACA 


ATG 


GGC 


TAC 


TGG 


CCA ACA TAC 


ACG 


CCC 


ATG 


GAA 


GTG 


CAC 


TTA 


ACC CCA 


GAA 


ATG 


ATT 


AAA 


CTT CAG. GAA 


GTA 


TTT 


AAG 


GCA 


TTT 


TAT 


CTT 


GGA AAG 


CAC 


AG 






















Leu Thr 


Met 


Gly 


Tyr 


Trp 


Pro Thr Tyr Thr 


Pro 


Met 


Glu 


Val 


His 


Leu 


Thr Pro 


Glu 


Met 


He 


Lys 


Leu Gin Glu 


Val 


Phe 


Lys 


Ala 


Phe 


Tyr 


Leu 



Gly Lys His 



0^4-2016-1 

TG TTT GTT CAC CCA AAT TGT AAA TAT GAT GCA AAG TGT ACT AAA CCA 

GAT TGT CCC TTC ACT CAT GTG AGT AGA AGA ATT CCA GTA CTG TCT CCA 
AAA CCA GTT GCA CCA CCA G 

Phe Val His Pro Asn Cys Lys Tyr Asp Ala Lys Cys Thr Lys Pro Asp 
Cys Pro Phe Thr His Val Ser Arg Arg He Pro Val Leu Ser Pro Lys 
Pro Val Ala Pro Pro 



wo 97/38085 



PCT/US97/OS930 




23(A) 



CTCAGAGAGG GCTGCCAGGA CGCGAGCCAC TGAGGAGCCG CTCAGCCAGC 
GCCATAGCCC TTAGGACTAT CGGTCACATT CTCGCGCTCC TGCTCCGGCT 
CCTCCATCTT GGCCTCGGCA GTGGCGGCTG CCGGGAGGAT GTGCCGCCTT 
CTGGCAGGGG GAAGAAGGAG GAGAAGATGA AGAAGCACCG GCGGGCCTTG 
GCCCTGGTCT CCTGCCTCTT TCTGTGCTCT CTGGTCTGGC TTCCCAGCTG 
GCX3TGTATGT TGTAAAGAGA GTTCCTCAGC TTCAGCGTCA TCATATTACT 
CTCAAGATGA CAACTGCGCA CTAGAAAATG AAGATGTACA ATTCCAGAAA 
AAGAATACAG AGTCAAAAAA GTTAAGTCCA CCGGTGGTGG AGACACTCCC 
TACAGTTGAT TTGCATGAAG AGTCTTCCAA TGCAGTTGTG GACAGTGAAA 
CTGTTGAAAA TATTTCCAGC TCATCTACCT CAGAAATCAC TCCAATCTCA 
AAGCTTGATG AAATAGAAAA ATCTGGTACT ATTCCGATAG CCAAACCAAG 
TGAAACTGAG CAGTCTGAAA CTGATTGTGA TGTTGGTGAG GCCCTTGATG 
CTAGTGCTCC AATTGAACAA CCTTCCTTTG TCAGTCCACC TGACAGCCTT 
GTTGGCCAGC ATATAGAAAA TGTATCATCT TCACATGGTA AAGGAAAGAT 
AACAAAATCA GAATTTGAAT CAAAAGTTTC AGCAAGTGAA CAGGGCGGTG 
GTGATCCAAA ATCTGCATTG AATGCTTCAG ATAATTTAAA AAATGAGAGC 
TCTGATTATA CAAAACCAGG AGACATTGAC CCTACATCAG TAGCAAGTCC 
CAAAGATCCA GAAGATATAC CAACATTTGA TGAATGGAAG AAGAAAGTTA 
TGGAAGTAGA AAAAGAAAAA AGTCAGTCGA TGCATGCATC TTCTAATGGA 
GGTTCACATG CCACCAAAAA GGTCCAGAAA AATCGAAATA ATTATGCCTC 
AGTAGAATGT GGTGCCAAAA TTCTAGCAGC TAATCCAGAA GCCAAGAGCA 
CATCTGCTAT TCTTATAGAA AATATGGATC TTTACATGTT GAATCCTTGC 
AGCACTAAAA TTTGGTTTGT TATTGAACTT TGTGAACCAA TTCAAGTAAA 
ACAGCTTGAT ATTGCAAATT ATGAATTATT TTCTTCTACT CCTAAAGATT 
TTCTGGTTTC TATCAGTGAC AGATATCCAA CAAATAAGTG GATTAAGCTG 
GGTACTTTTC ATGGTAQAGA TGAGCGGAAT GTACAGAGTT TCCCTTTAGA 
TGAACAGATG TATGCAAAAT ATGTCAAGGT TGAGTTGCTA TCACATTTTG 
GATCAGAGCA CTTTTGTCCA TTAAGCCTTA TAAGGGTATT TGGCACTAAC 
ATGGTGGAAG AATATGAAGA AATTGCTGAT TCCCAGTATC ACTCAGAACG 
CCAGGAACTA TTTGATGAGG ACTATGATTA TCCACTGGAT TATAATACTG 
GAGAGGATAA ATCCTCAAAA AATCTTCTTG GTTCTGCTAC AAATGCCATT 
CTAAATATGG TGAATATTGC TGCTAATATT CTGGGAGCAA AAACTGAAGA 
CCTGACAGAA GGAAATAAAA GTATATCTGA GAATGCCACT GCCACAGCTG 
CACCTAAAAT GCCTGAATCA ACTCCTGTTT CAACTCCTGT TCCATCTCCT 
GAGTATGTAA CCACTGAAGT ACACACACAT GACATGGAGC CGTCAACACC 
AGATACTCCA AAAGAGAGTC CCATTGTACA GTTAGTTCAA GAGGAGGAAG 
AGGAGGCAAG TCCATCTACA GTGACCCTTC TGGGCAGCX3G TGAACAGGAA 
GATGAATCAT CACCCTGGTT TGAGTCAGAG ACACAAATAT TTTGCAGTGA 
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Figure 23(B) 



ACTGACCACA ATTTGTTGTA TTTCTAGTTT TTCAGAATAC ATATATAAAT 
GGTGTTCAGT TAGAGTTGCT CTTTATCGGC AGCGCAGCCG AACTGCTTTG 
AGTAAAGGAA AAGATTATCT TGTGTTAGCT CAACCACCCT TACTACTTCC 
TGCGGAATCA GTAGATGTTT CAGTATTGCA ACCTCTGAGT GGAGAATTGG 
AAAATACGAA TATAGAAAGG GAAGCTGAAA CTGTTGTTCT GGGTGATTTA 
AGTAGTAGTA TGCACCAGGA TGACTTGGTG AATCACACTG TAGATGCAGT 
TGAACTTGAA CCAAGCCATT CTCAAACTCT TTCTCAGTCT CTTCTTTTAG 
ATATTACCCC AGAAATCAAT CCCTTGCCTA AAATAGAAGT ATCTGAGTCT 
GTTGAATATG AGGCAGGACA TATACCATCA CCAGTGATTC CCCAAGAGAG 
TTCTGTTGAG ATCGATAATG AAACAGAACA AAAGTCTGAG AGCTTTAGTT 
CTATAGAGAA ACCATCTATT ACCTATGAAA CAAATAAAGT TAATGAGTTA 
ATGGATAATA TTATAAAAGA AGATATGAAC TCCATGCAAA TTTTCACAAA 
GCTGTCTGAA ACAATAGTGC CACCAATAAA TACAGCCACT GTACCCGACA 
ATGAAGATGG GGAAGCCAAA ATGAATATAG CTGACACAGC AAAGCAAACT 
TTGATTTCTG TTGTGGATTC TTCTTCATTA CCTGAAGTAA AAGAAGAAGA 
ACAGTCTCCA GAAGATGCCC TTTTGAGAGG GTTACAGAGG ACAGCTACAG 
ATTTTTATGC TGAATTGCAA AATTCTACAG ATCTAGGATA TGCTAATGGA 
AATCTTGTAC ATGGATCAAA CCAAAAGGAG TCAGTATTTA TGAGACTTAA 
TAATCGTATT AAAGCCTTAG AAGTTAACAT GTCTCTCAGT GGTCGCTATC 
TGGAGGAGCT TAGCCAAAGG TACCGAAAAC AAATGGAAGA AATGCAAAAG 
GCTTTCAACA AAACAATCGT GAAACTTCAG AATACTTCAA GAATAGCAGA 
GGAGCAGGAT CAGCGGCAAA CTGAAGCCAT CCAGTTGCTA CAGGCACAGC 
TGACCAACAT GACACAGCTT GTTTCAAATT TATCAGCAAC AGTAGCAGAA 
TTGAAACGGG AGGTTTCAGA TCGACAAAGC TATCTTGTCA TATCTTTGGT 
TCTTTGTGTT GTCTTGGGAC TGATGCTTTG TATGCAGCGT TGTCGAAATA 
CTTCTCAATT TGATGGAGAT TATATTTCAA AACTTCCTAA AAGTAATCAG 
TATCCAAGCC CTAAAAGGTG TTTCTCTTCC TATGATGATA TGAATTTGAA 
AAGAAGAACT TCATTCCCAC TCATGAGATC CAAGTCTCTA CAGTTAACTG 
GCAAAGAAGT AGACCCAAAT GATTTGTACA TTGTAGAACC CCTCAAGTTT 
TCTCCAGAAA AGAAGAAGAA GCGCTGCAAG TACAAAATTG AAAAAATTGA 
GACCATAAAG CCTGAAGAAC CATTGCACCC CATAGCCAAT GGCGACATAA 
AAGGAAGAAA GCCCTTTACG AACCAGAGAG ATTTTTCTAA TATGGGAGAA 
GTTTATCACT CTTCTTATAA AGGTCCTCCA TCTGAAGGAA GCTCAGAAAC 
TTCATCACAG TCAGAAGAGT CCTATTTTTG TGGCATTTCA GCTTGCACAA 
GTCTGTGCAA TGGACAGTCT CAAAAGACAA AAACTGAGAA GAGGGCTTTA 
AAACGAAGAC GATCTAAAGT CCAAGACCAA GGAAAATTGA TAAAAACTCT 
AATACAGACT AAGTCGGGAT CATTGCCGAG CCTGCATGAC ATAATCAAAG 
GAAACAAAGA GATCACCGTG GGAACATTTG GTGTTACAGC AGTCTCGGGA 



3 o /ji 



wo 97/38085 



PCT/US97/05930 



Figure 23(C) 



CATATCTAAA 
TTCTTTGAAG 
ATATTAATGG 
GTAGATGGGA 
CTGATCACTT 
TCACAAGATT 
GGTGGGATAG 
TTCTTGCACC 
TGGAAGAGGA 
GTTAGGTTTT 
CTGAAGCTCA 
CAACTGGATC 
TTTTTAAGTC 
TTATTCAGGC 
AATCCCTACA 
TTTGACAAAT 
TCTTTAGCTT 
TTCACTGTTT 
GTACAAGTCT 
GTTAGTGAGT 
TTATTCCTTG 
CATATGTGAT 
TGGCAGCAAC 
AATGAATGAA 
TTTATTGTTG 
ATAACTGTTT 
GTGACTTATT 
ACAGCGTGTT 
ATTCAGAGCA 
AGATGAGAGC 
AAGAGAATTT 
TGAATTTCCA 



ATTAATTGAA CTTTTCATAC AGAAGACTTT TTTGTTGTTG 
AACAGTCTGT AGTATTTGAA GGGTTTGGGG GAGGGAGAAA 
GAAAGGCATT CAGAAATTAT GGTTTCTACC TTTTTAAAAA 
TTGTGCTCAA TCTTGGTTAA TGAGCTACAG TTTTACAAAG 
CCTATAAGGA CAATGGTAGA CATTTTATAA AGATGTTTTT 
AATTACTGGG ACAAAAGTAA TTTGGAAGCC CAGTTCCTTA 
GAATGAAAGC CTAAACCTCT TCCTTTAGCT TTGTTCCTAT 
TTCCCATATT TATGTGCCTT TTGTCTATTT ATAATGCCAC 
GGGATAACTT TTTCTGTTAT TTGATTTCTT TTATAACTTT 
TGAAGCTGCA AACACTACAA TGCTTTGAGG GGGTCTGTGC 
GGAGTGTGGA TCAGACAGTC TAAAGATCCT AAAAACTTGC 
TTTGTTTAGC AAACTCACTG GAAATGAACA CTTAATGGAA 
TGTTCTGTTA GGTAGATGGT GATGCTCTTG TTATTTTCAC 
TGGATTACTT CTTACTTAGT TACTAACTCA ATGAGGAAAA 
GGATCTTTTT TTGCAAACAA CTGATATATG CAGACAAATT 
TCACCTTTTA AACACGACGT TAACCGATTT GTGAAGGTTT 
ACATTTTAAA CATACACAAT AAACACTAAT CCTCCAAACT 
TTATTAGTAT GAATATAAAA TTTGAAGGTT TGGCCAATTA 
CATGATATAA TCACAGCCTG CATACATATG CACAGATCCA 
TTGTCAAGCT TAATCTAATT GGTTAAGTCT AAAGAGATTA 
ATGTTTGCTT TGTATTGGCT ACAAATGTGC AGAGGTAATA 
GTCGATGTCT CTGTCTTTTT TTTTGTCTTT AAAAAATAAT 
TGTATTTGAA TAAAATGATT TCTTAGTATG ATTGTACAGT 
AGTGGAACAT GTTTCTTTTT GAAAGGGAGA GAATTGACCA 
TGATGTTTAA GTTATAACTT ATTGAGCACT TTTAGTAGTG 
TTAAACTTGC CTAATACCTT TCTTGGGTAT TGTTTGTAAT 
TAACGCCTTC TTTGTTTGTT TAAGTTGCTG CTTTAGGTTA 
TTAGAAGATT TAAATTTCTT TCCTGTCTGC ACAATTAGCT 
AGAGGGCCTG ATTTTATAGA AGCCCCTTGA AAAGAGGTCC 
AGAGATACAG TGAGAAATTA TGTGATCTGT GTGTTGTGGG 
TCAATATGTA ACTACGGAGC TGTAGTGCCA TTAGAAACTG 
AATAAATCTG AACACTTGTC TTTATT 
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QRGLPGREPL RSRSASAIAL RTIGHILALL LRLLHLGLGS GGCREDVPPS 
GRGKKEEKMK KHRRALALVS CLFLCSLVWL PSWRVCCKES SSASASSYYS 
QDDNCALENE DVQFQKKNTE SKKLSPPWE TLPTVDLHEE SSNAWDSET 
VENISSSSTS EXTPISKLDE lEKSGTIPIA KPSETEQSET DCDVGEAZiDA 
SAPIEQPSFV SPPDSLVGQH lENVSSSHGK GKITKSEFES KVSASEQGGG 
DPKSALNASD NLKNESSDYT KPGDIDPTSV ASPKDPEDIP TFDEWKKKVM 
EVEKEKSQSM HASSNGGSHA TKKVQKNRNN YASVECGAKI LAANPEAKST 
SAILIENMDL YMLNPCSTKI WFVIELCEPI QVKQLDIANY EIiFSSTPKDF 
LVSISDRYPT NKWIKLGTFH GRDERNVQSF PLDEQMYAKY VKVELLSHFG 
SEHFCPLSLI RVFGTNMVEE YEEIADSQYH SERQELFDED YDYPLDYNTG 
EDKSSKNLLG SATNAILNMV NIAANILGAK TEDLTEGNKS ISENATATAA 
PKMPESTPVS TPVPSPEYVT TEVHTHDMEP STPDTPKESP IVQLVQEEEE 
EASPSTVTLL GSGEQEDESS PWFESETQIF CSELTTICCI SSFSEYIYKW 
CSVRVALYRQ RSRTALSKGK DYLVLAQPPL LLPAESVDVS VLQPLSGELE 
NTNIEREAET WLGDLSSSM HQDDLVNHTV DAVELEPSHS QTLSQSLLLD 
ITPEINPLPK lEVSESVEYE AGHIPSPVIP QESSVEIDNE TEQKSESFSS 
lEKPSITYET NKVNELMDNI IKEDMNSMQI FTKLSETIVP PINTATVPDN 
EDGEAKMNIA DTAKQTLISV VDSSSLPEVK EEEQSPEDAL LRGLQRTATD 
FYAELQNSTD LGYANGNLVH GSNQKESVFM RLNNRIKALE VNMSLSGRYL 
EELSQRYRKQ MEEMQKAFNK TIVKLQNTSR lAEEQDQRQT EAIQLLQAQL 
TNMTQLVSNL SATVAELKRE VSDRQSYLVI SLVLCWLGL MLCMQRCRNT 
SQFDGDYISK LPKSNQYPSP KRCFSSYDDM NLKRRTSFPL MRSKSLQLTG 
KEVDPNDLYI VEPLKFSPEK KKKRCKYKIE KIETIKPEEP LHPIANGDIK 
GRKPFTNQRD FSNMGEVYHS SYKGPPSEGS SETSSQSEES YFCGISACTS 
LCNGQSQKTK TEKRALKRRR SKVQDQGKLI KTLIQTKSGS LPSLHDIIKG 
NKEITVGTFG VTAVSGHI^N •LNFSYRRLF CCCSLKNSL* YLKGLGEGEN 
INGKGIQKLW FLPF«KVDGI VLNLG^^ATV LQS»SLPIRT MVDIIi»RCFF 
TRIilTGTKVI WKPSSLGGIG MKA*TSSFSF VPISCTFPYL CAFCLFIMPI- 
EEEG«LFLLF DFFYNFVRFL KLQTLQCFEG VCA*SSGVWI RQSKDPKNLP 
TGSLFSKLTG NEHLMEFLSL FC«VDGDALV IFTYSGWITS YLVTNSMRKK 
SLQDLFLQTT DICRQIFDKF TF«TRR«PIC EGFL»LTF«T YTINTNPPNF 
HCFY*YEYKI •RFGQLVQVS •YNHSLHTYA QIQLVSLSSIi I»LVKSKEII 
IP*CLLCIGY KCAEVIHM«C RCLCLFFCL» KIIGSNCI»I K*FLSMIVQ« 
•MKVEHVSF« KGEN^PFIW MFKL^LIEHF ••••LFLNLP NTFLGYCL^C 
DLFNAFFVCL SCCFRLTACF RRFKFLSCLH N«LFRARGPD FIEAP*KEVQ 
MRAEIQ^EIM •SVCCGKRIF NM^LRSCSAI RNCEFPNKSE HLSL 
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Figure 25(A) 



TAGAATTCAG CGGCCGCTGA ATTCTAGCTG CGGGGTAGGA GTCCGOGGCA 
GCCTCCGGGT AAGCCAAGCG CCGCGCAGTG CTGAGTTCCC GCACGCCGCA 
GAGCCATGGA GATCGGCACC GAGACCAGCC GCAAGATCCG GAGTGCCATT 
AAGGGGAAAT TACAAGAATT AGGAGCTTAT GTTGATGAAG AACTTCCTGA 
TTACATTATG GTGATGGTGG CCAACAAGAA AAGTCAGGAC CAAATGACAG 
AGGATCTGTC CCTGTTTCTA GGGAACAACA CAATTCGATT CACCGTATGG 
CTTCATGGTG TATTAGATAA ACTTCGCTCT GTTACAACTG AACCCTCTAG 
TCTGAAGTCT TCTGATACCA ACATCTTTGA TAGTAACGTG CCTTCAAACA 
AGAACAATTT CAGTCGGGGA GATGAGAGGA GGCATGAAGC TGCAGTGCCA 
CCACTTGCCA TTCCTAGCGC GAGACCTGAA AAAAGAGATT CCAGAGTTTC 
TACAAGTTCG CAGGAGTCAA AAACCACAAA TGTCAGACAG ACTTACGATG 
ATGGAGCTGC AACCCGACTA ATGTCAACAG TGAAACCTTT GAGGGAGCCA 
GCACCCTCTG AAGATGTGAT TGATATTAAG CCAGAACCAG ATGATCTCAT 
TGACGAAGAC CTCAACTTTG TGCAGGAGAA TCCCTTATCT CAGAAAGAAC 
CTACAGTGAC ACTTACATAT GGTTCTTCTC GCCCTTCTAT TGAAATTTAT 
CGACCACCTG CAAGTAGAAA TGCAGATAGT GGTGTTCATT TAAACAGGTT 
GCAATTTCAA CAGCAGCAGA ATAGTATTCA TGCTGCCAAG CAGCTTGATA 
TGCAGAGTAG TTGGGTATAt GAAACAGGAC GTTTGTGTGA ACCAGAGGTG 
CTTAACAGCT TAGAAGAAAC GTATAGTCCG TTCTTTAGAA ACAACTCGGA 
GAAAATGAGT ATGGAGGATG AAAACTTTCG GAAGAGAAAG TTGCCTGTGG 
TAAGTTCAGT TGTTAAAGTA AAAAAATTCA ATCATGATGG AGAAGAGGAG 
GAAGGAGATG ATGATTACGG GTCTCGAACA GGAAGCATCT CCAGCAGTGT 
GTCTGTGCCT GCAAAGCCTG AAAGGAGACC TTCTCTTCCA CCTTCTAAAC 
AAGCTAACAA GAATCTGATT TTGAAGGCTA TATCTGAAGC TCAAGAATCC 
GTAACAAAAA CAACTAACTA CTCTACAGTT CCACAGAAAC AGACACTTCC 
AGTTGCTCCC AGAACTCGAA CTTCTCAAGA AGAATTGCTA GCAGAAGTGG 
TCCAGGGACA AAGTAGGACC CCCAGAATAA GTCCCCCCAT TAAAGAAGAG 
GAAACAAAAG GAGATTCTGT AGAAAAAAAT CAAGCTGAGA TGAGTGAACT 
GAGTGTGGCA CAGAAACCAG AAAAACTTTT GGAGCGCTGC AAGTACTGGC 
CTGCTTGTAA AAATGGGGAT GAGTGTGCCT ACCATCACCC CATCTCACCC 
TGCAAAGCCT TCCCCAATTG TAAATTTGCT GAAAAATGTT TGTTTGTTCA 
CCCAAATTGT AAATATGATG CAAAGTGTAC TAAACCAGAT TGTCCCTTCA 
CTCATGTGAG TAGAAGAATT CCAGTACTGT CTCCAAAACC AGTTGCACCA 
CCAGCACCAC CTTCCAGTAG TCAGCTCTGC CGTTACTTCC CTGCTTGTAA 
GAAGATGGAA TGTCCCTTCT ATCATCCAAA ACATTGTAGG TTTAACACTC 
AATGTACAAG TCCGGACTGC ACATTCTACC ATCCCACCAT TAATGTCCCA 
CCACGACATG CCTTGAAATG GATTCGACCT CAAACCAGCG AATAGCACCC 
AGTCCTGCCT GGCAGAAGAT CATGCAGTTT GGAAGTTTTC ATGTACTGAT 
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GAAAGATACT 
CTTTCATAAT 
TCAAGTTTGT 
GTTTTTACTA 
ATTTGGGGCA 
GACATCATGG 
GTGTGAGTGG 
ACTACTTTCA 
AAACAGCATT 
TTTTTAATTT 
CTCAGGCTGA 
GACAGGATGA 
GTGCAGAAAA 
TAATGCCTTC 
ACAAATTTTT 
AGTCACAAGA 
CATTCTCAGA 
TTGAAATGTA 
TAATTACAGA 
AACTTTTGGC 
AAAACTGTTA 
CACTGATTTC 
GTGGTTAGAA 
TCTCAGGACC 



CTACAGAACT 
ATGAAGTTTT 
AAGTTTATTA 
TGAAAAGACA 
TGTTTGTGCA 
TTAGTCATGG 
AGAGATGCAG 
CTTTTCCCAA 
GGCCAAAGGT 
TTAAGTGAAT 
GTGCTACTTT 
ATGAGGTGGG 
TAGGAACAGT 
TAAATAATTT 
TTACAAGTAT 
TTATAAATGT 
ATCCACAGAA 
AAAATTAGAT 
GATCAGATCA 
CTACTGTATT 
AGGCAAGAAG 
AAAGACTTGG 
AAGTGGATTA 
AAATTAAACT 



TGTCAAATCT 
ATTGCCTATC 
TGTGGTTTTA 
GCTTAAGGAA 
CTGCTGTTGT 
TACTGCAGCT 
TGAGGCAGTT 
AGATTATATA 
ACTGAGGCTG 
TTAGTTTG7UV 
CGGTAAAGTT 
TATGGACAGT 
TCTATACAGT 
TTTTGGGAAA 
TTACATACTG 
ACATATGTAT 
AATATACTTA 
TTAAATACfTA 
GATAGGTAAA 
ACTTACAGAG 
TGTCAAATGC 
TGTATAGTGT 
ATGCAAAAGG 
GCTAA 



TTGAAACTTG 
TATCTGAAGT 
ACATTGGGTG 
GAGCTAAATT 
GAGGATCAGC 
TAGGGGGCTA 
GTCATTATTC 
ATGTTCATAA 
CTTAAAATAT 
AAGCATGATT 
CCAGTTTTCC 
GGAGGCAGCT 
GCTCTCATTT 
CTACATTATC 
TATCTGAAAA 
TCTCACATTC 
GTTACTACTG 
TATTTTAAAT 
CTGCAAGATA 
TTTTTTTGTG 
TTTAGAGTTA 
TAAAAATTAA 
GGTAATAAAG 



GAATATATTG 
GTCTAATTTT 
TTTTTGTTTT 
CTGTTAAAAT 
ATATGAAATT 
CACGGTTGCT 
TAAAAATTGT 
TCCACCATGA 
TCAATTCTGC 
ATACAGGCCT 
TGCCTTCTGT 
GGAATGGCAA 
ACTAATAACA 
ACAAAATTAT 
CAGACTTTAA 
TGAAAAATAA 
AAGATAATTT 
GACAGAACTA 
GATAGGATGA 
TGTGGTTTTT 
AATAACAGAT 
AGCTTAAAAG 
ACTGCAACAT 
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•NSAAAEF^L RGRSPRQPPG KPSAAQC*VP ARRRAMEIGT ETSRKIRSAI 
KGKLQELGAY VDEELPDYIM VMVANKKSQD QMTEDLSLFL GNNTIRFTVW 
LHGVLDKLRS VTTEPSSLKS SDTNIFDSNV PSNKNNFSRG DERRHEAAVP 
PLAIPSARPE KRDSRVSTSS QESKTTNVRQ TYDDGAATRL MSTVKPLREP 
APSEDVIDIK PEPDDLIDED LNFVQENPLS QKEPTVTLTY GSSRPSIEIY 
RPPASRNADS GVHLNRLQFQ QQQNSIHAAK QLDMQSSWVY ETGRLCEPEV 
LNSLEETYSP FFRNNSEKMS MEDENFRKRK LPWSSWKV KKFNHDGEEE 
EfiDDDYGSRT GSISSSVSVP AKPERRPSLP PSKQANKNLI LKAISEAQES 
VTKTTNYSTV PQKQTLPVAP RTRTSQEELL AEWQGQSRT PRISPPIKEE 
ETKGDSVEKN QAEMSELSVA QKPEKLLERC KYWPACKNGD ECAYHHPISP 
CKAFPNCKFA EKCLFVHPNC KYDAKCTKPD CPFTHVSRRI PVLSPKPVAP 
PAPPSSSQLC RYFPACKKME CPFYHPKHCR FNTQCTSPDC TFYHPTINVP 
PRHALKWIRP QTSE*HPVLP GRRSCSLEVF MY«»KILYRT CQIFETWNIL 
LS*YEVLLPI YLKCLIFQVC KFIMWF»HWV FLFCFYYEKT A^GRAKFC^N 
IWGMFVHCCC EDQHMKLTSW LVMVLQLRGL HGCCVSGEMQ •GSCHYSKNC 
TTFTFPKDYI MFIIHHENSI GQRY^GCLKY SILLFNF»VN LV«KA«LYRP 
LRLSATFGKV PVFIiPSVTG* MRWVWTVEAA GMASAENRNS SIQCSHLLIT 
•CLLNNFFGK LHYHKIIQIF LQVFTYCI-K QTLKSQDYKC TYVFSHSEK* 
HSQNPQKIYL VTTEDNF«NV KIRFK^YILN DRTIITEIRS DR«TAR«IG« 
NFWPTVLLTE FFCVWFLIOjL RQEVSNALEL NNRSLISKTW CIVLKIKA^K 
WRKVD^CKR GNKDCNILRT KLNC«' 



MEIGT ETSRKIRSAI 

KGKLQELGAY VDEELPDYIM VMVANKKSQD QMTEDLSLFL GNNTIRFTVW 
LHGVLDKLRS VTTEPSSLKS SDTNIFDSNV PSNKNNFSRG DERRHEAAVP 
PLAIPSARPE KRDSRVSTSS QESKTTNVRQ TYDDGAATRL MSTVKPLREP 
APSEDVIDIK PEPDDLIDED LNFVQENPLS QKEPTVTLTY GSSRPSIEIY 
RPPASRNADS GVHLNRLQFQ QQQNSIHAAK QLDMQSSVnnf ETGRLCEPEV 
LNSLEETYSP FFRNNSEKMS MEDENFRKRK LPWSSWKV KKFNHDGEEE 
EGDDDYGSRT GSISSSVSVP AKPERRPSLP PSKQANKNLI LKAISEAQES 
VTKTTNYSTV PQKQTLPVAP RTRTSQEELL AEWQGQSRT PRISPPIKEE 
ETKGDSVEKN QAEMSELSVA QKPEKLLERC KYWPACKNGD ECAYHHPISP 
CKAFPNCKFA EKCLFVHPNC KYDAKCTKPD CPFTHVSRRI PVLSPKPVAP 
PAPPSSSQLC RYFPACKKME CPFYHPKHCR FNTQCTSPDC TFYHPTINVP 
PRHALKWIRP QTSE 
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